Disk array device

ABSTRACT

Data blocks and redundant data are distributed across disk drives. In response to a first read request transmitted from a host device, a controller issues second read requests to read the data blocks and the redundant data from the disk drives. Further, the controller detects the disk drive which is no longer required to read the data block or redundant data from among the disk drives, and issues a read termination command to the detected disk drive to terminate reading therefrom. In a disk array device with such structure, even reading of one parity data with much time required does not affect other reading.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] This invention relates to disk array devices and morespecifically, to a disk array device in which multiple disks (typically,magnetic disks or optical disks) constructs a disk array, capable ofstoring a large volume of data, transferring data at high speed, andfurther providing higher system reliability.

[0003] 2. Description of the Background Art

[0004] Typical disk array devices include a RAID (Redundant Array ofInexpensive Disks). The RAID is discussed in detail in “A Case forRedundant Arrays of Inexpensive Disks”, by David A. Patterson, GarthGibson, Randy H. Katz, University of California Berkeley, December 1987,and others. Six basic architectures of the RAID from levels 0 to 5 havebeen defined. Described below is how a RAID adopting the level 3architecture (hereinafter referred to as RAID-3) controls input/outputof data. FIG. 69 is a block diagram showing the typical structure of theRAID-3. In FIG. 69, the RAID includes a controller 6901, and five diskdrives 6902A, 6902B, 6902C, 6902D, and 6902P. A host device is connectedto the controller 6901, making a read/write request of data to the RAID.When receiving data to be written, the controller 6901 divides the datainto data blocks. The controller 6901 generates redundant data usingthese data blocks. After creation of the redundant data, each data blockis written into the disk drives 6902A to 6903D. The redundant data iswritten into the disk drive 6902P.

[0005] Described next is the procedure of creating redundant data withreference to FIGS. 70a and 70 b. Data to be written arrives at thecontroller 6901 by unit of a predetermined size (2048 bytes, in thisdescription). Here, as shown in FIG. 70a, currently-arrived data iscalled D-1. The data D-1 is divided into four by the controller 6901,and thereby four data blocks D-A1, D-B1, D-C1, and D-D1 are created.Each data block has a data length of 512 bytes.

[0006] The controller 6901 then creates redundant data D-P1 using thedata blocks D-A1, D-B1, D-C1, and D-D1 by executing calculation given by

D-P1i=D-A1i xor D-B1i xor D-C1i xor D-D1i   (1).

[0007] Here, since each of the data blocks D-A1, D-B1, D-C1, D-D1, andD-P1 has a data length of 512 bytes, i takes on natural numbers from 1to 512. For example, when i=1, the controller 6901 calculates theredundant data D-P11 using each first byte (D-A11, D-B11, D-C11, andD-D11) of the data blocks D-A1, D-B1, D-C1, and D-D1. Here, D-P11 is afirst byte of the redundant data. When i=2, the controller 6901calculates the redundant data D-P12 using each second byte (D-A12,D-B12, D-C12, and D-D12) of the data blocks D-A1, D-B1, D-C1, and D-D1.Thereafter, the controller 6901 repeats the calculation given by theequation (1) until the last byte (512nd byte) of the data blocks D-A1,D-B1, D-C1, and D-D1 to calculate redundant data D-P11, D-P12, . . .D-P1512. The controller 6901 sequentially arranges the calculatedredundant data D-P11, D-P12, . . . D-P1512 to generate the redundantdata D-P1. As clear from the above, the redundant data D-P1 is parity ofthe data blocks D-A1, D-B1, D-C1, and D-D1.

[0008] The controller 6901 stores the created data blocks D-A1, D-B1,D-C1, and D-D1 in the disk drives 6902A, 6902B, 6902C, and 6902D,respectively. The controller 6901 also stores the generated redundantdata D-P1 in the disk drive 6902P. The controller 6901 stores the datablocks D-A1, b-B1, D-C1, D-D1, and D-P1 in the disk drives 6902A, 6902B,6902C, 6902D and 6902P, respectively, as shown in FIG. 70b.

[0009] The controller 6901 further controls reading of data. Here,assume that the controller 6901 is requested to read the data D-1 by thehost device. In this case, when each of the disk drives 6902A, 6902B,6902C, and 6902D operates normally, the controller 6901 reads the datablocks D-A1, D-B1, D-C1, and D-D1 from the disk drives 6902A, 6902B,6902C, and 6902D, respectively. The controller 6901 assembles the readdata blocks D-A1, D-B1, D-C1, and D-D1 to compose the data D-1 of 2048bytes. The controller 6901 transmits the composed data D-1 to the hostdevice.

[0010] There is a possibility that a failure or fault may occur in anydisk drives. Here, assume that the disk drive 6902C is failed and thehost device sends a read request for the data D-1. In this case, thecontroller 6901 first tries to read the data blocks D-A1, D-B1, D-C1,and D-D1 from the disk drives 6902A, 6902B, 6902C, and 6902D,respectively. However, since the disk drive 6902C is eventually failed,the data block D-C1 is not read therefrom. Assume herein, however, thatthe data blocks D-A1, D-B1, and D-D1 are read from the disk drives6902A, 6902B, and 6902D normally. When recognizing that the data blockD-C1 cannot be read, the controller 6901 reads the redundant data D-P1from the disk drive 6902P.

[0011] The controller 6901 then recovers the data block D-C1 byexecuting calculation given by the following equation (2) using the datablocks D-A1, D-B1, and D-D1 and the redundant data D-P1.

[0012] D-C1i=D-A1i xor D-B1i xor D-D1i xor D-P1i   (2).

[0013] Here, since each of the data blocks D-A1, D-B1,, and D-D1, andthe redundant data D-P1 has a data length of 512 bytes, i takes onnatural numbers from 1 to 512. The controller 6901 calculates theredundant data D-C11, D-C12, . . . D-C1512 by repeatedly executing thecalculation given by the equation (2) from the first byte to 512nd byte.The controller 6901 recovers the data block D-C1 based on thesecalculation results. Therefore, all of the data blocks D-A1 to D-D1 arestored in the controller 6901. The controller 6901 assembles the storeddata blocks D-A1 to D-D1 to compose the data D-1 of 2048 bytes. Thecontroller 6901 transmits the composed data D-1 to the host device.

[0014] As described above, there is a possibility that the RAID in FIG.69 cannot read the requested data block from a faulty disk drive (anyone of the disk drives 6902A to 6902D). The RAID, however, operatescalculation of parity given by the equation (2) using the data blocksread from the other four normal disk drives and the redundant data. Thecalculation of parity allows the RAID to recover the data block storedin the faulty disk drive.

[0015] In recent years, the RAID architecture, as an example of a diskarray, is often implemented also in video servers which provides videoon a user's request. In video servers, data to be stored in the diskdrives 6902A to 6902D of the RAID includes two types: video data andcomputer data (typically, video title and total playing time). Sincevideo data and computer data have different characteristics,requirements of the RAID system are different in reading video data andcomputer data.

[0016] More specifically, computer data is required to be reliablytransmitted to the host device. That is, when a data block of computerdata cannot be read, the RAID has to recover the data block by operatingcalculation of parity. For this purpose, the RAID may take some time totransmit the computer data to the host device. On the other hand, videodata is replayed as video at the host device. When part of video dataarrives late at the host device, the video being replayed at the hostdevice is interrupted. More specifically, video data in general is farlarger in size than 2048 bytes, which are read at one time. The videodata is composed of several numbers of data of 2048 bytes. Therefore,when requesting the video data to be replayed, the host device has tomake a read request of data of 2048 bytes several times. On the otherhand, the RAID has to read the video data from the disk drives 6902A to6902D within a predetermined time from the arrival of each read request.If reading of the data of 2048 bytes is delayed even once, the videobeing replayed at the host device is interrupted. Therefore, the RAID isrequired to sequentially transmit the data of 2048 bytes composing thevideo data to the host device. Described below are RAID systemsdisclosed in Japanese Patent Laying-Open No. 2-81123 and No. 9-69027,which satisfy such requirements.

[0017] A first RAID disclosed in Japanese Patent Laying-Open No. 2-81123is now described. The first RAID includes a disk drive group composed ofa plurality of disk drives. The disk drive group includes a plurality ofdisk drives for storing data (hereinafter referred to as data-drives)and a disk drive for storing redundant data created from the data(hereinafter referred to as parity-drive). When reading data from theplurality of data-drives, the first RAID checks whether reading from oneof the data-drives is delayed for more than a predetermined time afterthe reading from the other data-drives starts. The first RAID determinesthat the data-drive in which reading is delayed for more than thepredetermined time is a faulty drive. After detecting the faulty drive,the first RAID recovers the data to be read from the faulty drive, usingdata in the other data-drives and redundant data in the parity-drive.

[0018] As shown in FIG. 71a, the first RAID determines that thedata-drive D is failed when the data-drive D does not start readingafter the lapse of the predetermined time from the start of a fourthreading (data-drive B). To recover the data block of the data-drive D,the first RAID operates calculation of parity. In general disk drives,however, the time from start to end of reading is not constant. Somedisks may complete reading in a short period of time, while others maytake a long time to complete reading after several failures. Therefore,in the first RAID, as shown in FIG. 71b, even though the parity-drive Pstarts reading earlier than the data-drive B which starts readingfourth, the data-drive B may complete its reading earlier than theparity-drive P. In this case, even after the lapse of the predeterminedtime after the data-drive B starts reading, the redundant data has notbeen read from the parity-drive P. Therefore, the first RAID cannotrecover the data-block of the data-drive D. As a result, transmission ofthe data composing the video data being read is delayed, and the videobeing replayed at the host device might be interrupted.

[0019] A second RAID disclosed in Japanese Patent Laying-Open No.9-69027 is now described. The second RAID also includes a plurality ofdata-drives for storing data, and a parity-drive for storing redundantdata created from the data. The second RAID does not read the redundantdata from the parity-drive under normal conditions. That is, when a readrequest arrives, the second RAID tries to read the data blocks from theplurality of data-drives. The second RAID previously stores time(hereinafter referred to as predetermined time) by which the pluralityof data-drives have to have completed reading. In some cases, the secondRAID detects the data-drive which has not completed reading after thelapse of the predetermined time from the time of transmission a readrequest to each data-drive. In this case, the second RAID reads theredundant data from the parity-drive to recover the data block which hasnot yet been completely read.

[0020] However, the redundant data is started to be read after the lapseof the predetermined time (after timeout) from the time of transmissionof the read request for the data block. Therefore, as shown in FIG. 72a,it disadvantageously takes much time to recover the unread data block.Furthermore, in some cases, the second RAID successfully reads a datablock immediately after timeout as shown in FIG. 72b. In this case, thesecond RAID may transmit the data faster with the data block readimmediately after the timeout. Once the redundant data is started to beread, however, the second RAID does not use the data block readimmediately after the timeout, and as a result, data transmission to thehost device may be delayed. This delay may cause interruption of videobeing replayed at the host device.

[0021] In most cases, in the disk drive where reading of the data blockis delayed, read requests subsequent to the read request currently beingprocessed wait for read operation. Therefore, when the disk drive failsto read the data block and retries reading of the data block, processingof the subsequent read requests is delayed. As evident from above, inthe conventional disk array device including the above first and secondRAID, a read failure may affect subsequent reading.

[0022] Referring back to FIG. 69, the controller 6901 stores the fourdata blocks D-A1 to D-D1 and the redundant data D-P1 in the disk drives6902A to 6902D and 6902P, respectively. The four data blocks D-A1 toD-D1 and the redundant data D-P1 are generated from the same data D-1 of2048 bytes. Thus, a set of data blocks and redundant data generatedbased on the same data received from a host device is herein called aparity group. Also, a set of a plurality of disk drives in which datablocks and redundant data of the same parity group are written is hereincalled a disk group.

[0023] In the disk array device such as RAID, a failure may occur in anydisk drive therein. The disk array device, however can recover the datablock of the faulty disk drive by operating calculation of parity usingthe other data blocks and the redundant data of the same parity group.In the above description, the disk array device assembles data to betransmitted to the host device using the recovered data block. If thefaulty disk drive is left as it is, calculation of parity is executedwhenever the data block is tried to be read from the faulty disk drive,which takes much time. As a result, data transmission to the host deviceis delayed, and video being replayed at the host device is interrupted.Therefore, some disk array devices executes reconstruction processing.In the reconstruction processing, the data block or the redundant datain the faulty disk drive is recovered, and the recovered data block orredundant data is rewritten in another disk drive or a normal area inthe faulty disk drive.

[0024] However, when another failure occurs in another disk drive of thesame parity group while the defective disk drive is left as it is,reconstruction cannot be executed. Therefore, reconstruction is requiredto be executed as early as possible. An example of such reconstructionis disclosed in Japanese Patent Laying-Open No. 5-127839. A disk arraydevice disclosed in this publication (hereinafter referred to as firstdisk array device) includes a disk array composing a plurality of diskdrives, and a disk controller for controlling the disk array. The diskcontroller monitors states of operation of the disk array. Whenreconstruction is required, the disk controller selects and executes oneof three types of reconstruction methods according to the state ofoperation of the disk array. In one method, reconstruction occurs duringidle time of the array. In a second method reconstruction is interleavedbetween current data area accessing operations of the array at a ratewhich is inversely proportional to activity level of the array. In athird method, the data are reconstructed when a data area being accessedis a data area needing reconstruction.

[0025] As described above, in some cases, both computer data and videodata are written in each disk drive of the disk array device. Therefore,both read requests for reading the computer data and those for readingthe video data arrive at the disk array device from the host device.When a large number of read requests for the computer data arrive, thedisk array device has to execute reading of the computer datarepeatedly, and as a result, reading of the video data may be delayed.This delay may cause interruption of the video being replayed at thehost device.

[0026] The first disk array device executes reconstruction on the faultydisk drive while processing read requests being transmitted from thehost device. Such reconstruction is executed on the entire disk drivesof the same disk group with one operation. That is, reconstructioncannot be executed unless the entire disk drives of the same disk groupare in an idle state.

[0027] In RAID-4 or RAID-5, each disk drive operates independently, andtherefore if any one of the disk drives is in an idle state, the otherdisk drives of the same disk group may be under load conditions. As aresult, the first disk array device cannot take sufficient time toexecute reconstruction, and thus efficient reconstruction cannot bemade.

[0028] Further, the conventional disk array device may executereassignment. The structure of a disk array device of executingreassigning is similar to that shown in FIG. 69. Reassignment processingis now described in detail. Each disk drive composing a disk array hasrecording areas, in which a defect may occur due to various reasons.Since the disk drive cannot read/write a data block or redundant datafrom/in a defective area, an alternate recording area is reassigned tothe defective recording area. In the alternate recording area, the datablock or redundant data stored in the defective recording area or to bewritten in the defective area is stored. Two types of such reassignmenthave been known.

[0029] One reassignment is so-called auto-reassign, executed by eachdisk drive composing the disk array. Each disk drive previously reservespart of its recording areas as alternate areas. When the data block orredundant data cannot be read/written from/in the recording areaspecified by the controller, the disk drive assumes that the specifiedarea is defective. When detecting the defective area, the disk driveselects one of the reserved alternate areas, and assigns the selectedalternate area to the detected defective area.

[0030] The other reassignment is executed by the controller. Thecontroller previously reserves part of its recording areas as alternateareas, and manages information for specifying the alternate areas. Whenthe disk drive cannot access the recording area specified by thecontroller, the disk drive notifies the controller that the recordingarea is defective. When receiving the notification of the defectivearea, the controller selects one of the alternate areas from the managedinformation, and reassigns the selected alternate area to the defectivearea.

[0031] In some recording areas, reading or writing may be eventuallysuccessful if the disk drive repeats access to these recording areas(that is, if the disk drive takes much time to access thereto). In theabove two types of reassignment, however, the alternate area cannot beassigned to the recording area to which the disk drive takes much timeto access, because reading/writing will eventually succeed even thoughmuch time is required. When the data block composing the video data isstored in such recording area, however, it takes much time to read thedata block. As a result, video being replayed at the host device may beinterrupted.

SUMMARY OF THE INVENTION

[0032] Therefore, an object of the present invention is to provide adisk array device capable of reading data (data block or redundant data)from a disk array to transmit the same to a host device and writing datafrom the host device in the disk array in a short period of time.

[0033] The present invention has the following features to solve theproblem above.

[0034] A first aspect of the present invention is directed to a diskarray device executing read operation for reading data recorded thereinin response to a first read request transmitted thereto, the disk arraydevice with data blocks generated by dividing the data and redundantdata generated from the data blocks recorded therein, comprising:

[0035] m disk drives across which the data blocks and the redundant dataare distributed; and

[0036] a control part controlling the read operation;

[0037] the control part

[0038] issuing second read requests to read the data blocks and theredundant data from the m disk drives in response to the first readrequest sent thereto;

[0039] detecting the disk drive reading from which of the data block orthe redundant data is no longer necessary from among the m disk drives;and

[0040] issuing a read termination command to terminate the detected diskdrive.

[0041] As described above, in the first aspect, when it is determinedthat reading of one of the data blocks or the redundant data is notnecessary, this reading is terminated. Therefore, the disk drive whichterminated this reading can advance the next reading. Thus, it ispossible to provide the disk array device in which, if reading of onedisk drive is delayed, this delay does not affect other reading.

[0042] According to a second aspect, in the first aspect,

[0043] when (m-1) of the disk drives complete reading,

[0044] the control part

[0045] determines that reading being executed in one remaining diskdrive is no longer necessary; and

[0046] issues a read termination command to the remaining disk drive.

[0047] As described above, in the second aspect, also when reading ofone disk drive takes too much time, this reading is terminated.Therefore, Thus, it is possible to provide the disk array device inwhich, if reading of one disk drive is delayed, this delay does notaffect other reading.

[0048] According to a third aspect, in the first aspect

[0049] when detecting that two or more of the disk drives cannotcomplete reading,

[0050] the control part

[0051] determines that reading being executed in other disk drives is nolonger necessary; and

[0052] issues a read termination command to the determined disk drive.

[0053] In the third aspect, when calculation of parity cannot beexecuted, reading presently being executed can be terminated. Therefore,since unnecessary reading is not continued, it is possible to providethe disk array device in which unnecessary reading does not affect otherreading.

[0054] According to a fourth aspect, in the first aspect, when the (m-1)the disk drives complete reading,

[0055] the control part

[0056] determines that reading not yet being executed in one remainingdisk drive is no longer necessary; and

[0057] issues a read termination command to the remaining disk drive.

[0058] In the fourth aspect, unnecessary reading is not continued, it ispossible to provide the disk array device in which unnecessary readingdoes not affect other reading.

[0059] A fifth aspect of the present invention is directed to a diskarray device executing read operation for reading data recorded thereinin response to a first read request from a host device, the disk arraydevice with data blocks generated by dividing the data and redundantdata generated from the data blocks recorded therein, comprising:

[0060] m disk drives across which the data blocks and the redundant dataare distributed;

[0061] a parity calculation part operating calculation of parity from(m-2) of the data blocks and the redundant data to recover one remainingdata block; and

[0062] a control part controlling the read operation;

[0063] the control part

[0064] issuing second read requests to read the data blocks and theredundant data from the m disk drives in response to the first readrequest sent thereto;

[0065] when (m-1) of the disk drives complete reading, detecting whethera set of the data blocks and the redundant data has been read from the(m-1) disk drives;

[0066] when detecting that the set of the data blocks and the redundantdata has been read, issuing a recovery instruction to the paritycalculation part to recover the data block not read from the oneremaining disk drive after waiting for a predetermined time period froma time of detection; and

[0067] when the one remaining data block is recovered by the calculationof parity in the parity calculation part, executing operation fortransmitting the data to the host device; wherein

[0068] the predetermined time period is selected so as to ensure datatransmission to the host device without delay.

[0069] In the fifth aspect, after a set of the data blocks and redundantdata is read from (m-1) disk drives, the controller waits for apredetermined time until the remaining one data block is read. If theremaining one data block has been read by the predetermined time,calculation of parity is not required. Thus, it is possible to reducethe number of operation of calculation of parity.

[0070] According to a sixth aspect, in the fifth aspect,

[0071] when detecting that the set of the data blocks and the redundantdata has not been read, the control part transmits the data to the hostdevice without waiting for the predetermined time period from the a timeof detecting.

[0072] In the sixth aspect, if only the data blocks are read from the(m-1) disk drives, the controller does not wait for a predetermined buttransmits the data to the host device. Therefore, it is possible toachieve the disk array device capable of reading a larger volume of dataper unit of time.

[0073] According to a seventh aspect, in the fifth aspect,

[0074] the predetermined time period is selected based on a start ofreading in each of the disk drives and a probability of completing thereading.

[0075] In the seventh aspect, in most cases, the remaining one datablock is read. Therefore, it is possible to reduce the number ofoperation of calculation of parity.

[0076] An eighth aspect of the present invention is directed to a diskarray device executing read operation for reading data recorded thereinin response to a first read request from a host device, the disk arraydevice with data blocks generated by dividing the data and redundantdata generated from the data blocks recorded therein, comprising:

[0077] m disk drives across which the data blocks and the redundant dataare distributed;

[0078] a parity calculation part operating calculation of parity from(m-2) of the data blocks and the redundant data to recover one remainingdata block; and

[0079] a control part controlling the read operation;

[0080] the control part

[0081] issuing second read requests to read the data blocks and theredundant data from the m disk drives in response to the first readrequest sent thereto;

[0082] when (m-1) of the disk drives complete reading, detecting whethera set of the data blocks and the redundant data has been read from the(m-1) disk drives;

[0083] when detecting that the set of the data blocks and the redundantdata has been read, issuing a recovery instruction to the paritycalculation part to recover the data block not read from the oneremaining disk drive after waiting for a predetermined time period froma time of detection; and

[0084] when the one remaining block is recovered by the calculation ofparity in the parity calculation part, executing operation fortransmitting the data to the host device; wherein the recoveryinstruction is issued while the parity calculation part is not operatingcalculation of parity.

[0085] In the eighth aspect, the controller reliably issues a recoveryinstruction only when calculation of parity is not executed. Thisprevents a needless load on the parity calculator, achieving effectiveuse of the parity calculator.

[0086] According to a ninth aspect, in the eighth aspect, the disk arraydevice further comprises:

[0087] a table including a time period during which the paritycalculation part can operate calculation of parity, wherein

[0088] the control part further issues the recovery instruction when theparity calculation part does not operate calculation of parity byreferring to the time period included in the table.

[0089] In the ninth aspect, the controller can recognize timing ofissuing a recovery instruction only by referring to the time period inthe table.

[0090] A tenth aspect of the present invention is directed to A diskarray device executing read operation for reading data recorded thereinin response to a first read request from a host device, the disk arraydevice with data blocks generated by dividing the data and redundantdata generated from the data blocks recorded therein, comprising:

[0091] m disk drives across which the data blocks and the redundant dataare distributed;

[0092] a parity calculation part operating calculation of parity from(m-2) of the data blocks and the redundant data to recover one remainingdata block; and

[0093] a control part controlling the read operation;

[0094] the control part

[0095] in response to the first read request received thereto,determining whether (m-1) of the disk drives have previously failed toread each data block or not;

[0096] when determining that the (m-1) disk drives have not previouslyfailed to read each of the data block, issuing second read requests tothe (m-1) disk drives to read only each the data blocks; and

[0097] when the data blocks are read from the (m-1) disk drives,executing operation for transmitting the data to the host device.

[0098] In the tenth aspect, in some cases, a second read request may notbe issued for the redundant data. That is, when the redundant data isnot required, such unnecessary redundant data is not read. As a result,it is possible to increase a volume of data which can be read per unitof time.

[0099] According to an eleventh aspect, in the tenth aspect,

[0100] the control part

[0101] when determining that the (m-1) disk drives have previouslyfailed to read each the data block, issues second read requests to the mdisk drives to read (m-1) of the data blocks and the redundant data;

[0102] when the (m-1) disk drives complete reading, detects whether aset of the data blocks and the redundant data has been read from the(m-1) disk drives or not;

[0103] when detecting that the set of the data blocks and the redundantdata has been read, issues a recovery instruction to the paritycalculation part to recover the data block not read from one remainingdisk drive; and

[0104] when the one remaining data block is recovered by the calculationof parity in the parity calculation part, executes operation forexecuting operation for transmitting the data to the host device.

[0105] In the eleventh aspect, a second read request is issued forreading the redundant data when required. Therefore, it is possible toimmediately operate calculation of parity.

[0106] According to a twelfth aspect, in the eleventh aspect, the diskarray device further comprises:

[0107] a table registering therein recording areas of the data blockswhich have previously been failed to be read by the disk drives, wherein

[0108] the control part determines whether to issue the second readrequests to the (m-1) disk drives or to the m disk drives.

[0109] In the twelfth aspect, the controller can easily determinewhether to issue a second read request for reading the redundant dataonly by referring to the table.

[0110] According to a thirteenth aspect, in the twelfth aspect, the diskarray device further comprises:

[0111] a reassignment part, when a defect occurs in a recording area ofthe data block or redundant data in the m disk drives, executingreassign processing for assigning an alternate recording area to thedefective recording area, wherein

[0112] when the reassignment part assigns the alternate recording areato the defective recording area of the data block registered in thetable by the reassignment part, the control part deletes the defectiverecording area of the data block from the table.

[0113] In the thirteenth aspect, an alternate recording area is assignedto the defective recording area, and the data block or redundant data isrewritten in this alternate area. Therefore, in the table, the number ofdata blocks which require long time in read operation can be reduced.Therefore, it is possible to provide the disk array device capable ofreading a larger volume of data per unit of time.

[0114] According to a fourteenth aspect, in the thirteenth aspect, thedisk array device further comprises:

[0115] a first table storage part storing a first table in which anaddress of the alternate recording area previously reserved in each ofthe m disk drives can be registered as alternate recording areainformation; and

[0116] a second table storage part storing a second table in whichaddress information of the alternate recording area assigned to thedefective recording area can be registered, wherein

[0117] the reassignment part

[0118] when the second read requests are transmitted from the controlpart to the m disk drives, measures a delay time in each of the diskdrives;

[0119] determines whether each of the recording area of the data blocksor the redundant data to be read by each second read request isdefective or not based on the measured delay time;

[0120] when determining that the recording area is defective, assignsthe alternate recording area to the defective recording area based onthe alternate recording area information registered in the first tableof the first table storage part; and

[0121] registers the address information of the assigned alternaterecording area in the second table of the second table storage part,

[0122] the control part issues the second read requests based on theaddress information registered in the second table of the second tablestorage part, and

[0123] the delay time is a time period calculated from a predeterminedprocess start time.

[0124] In the fourteenth aspect, the reassignment part determineswhether the recording area is defective or not based on an elapsed timecalculated from a predetermined process start time. When a delay in theresponse returned from the disk drive is large, the reassignment partdetermines that the recording area being accessed for reading isdefective, assigning an alternate recording area. This allows the diskarray device to read and transmit the data to the host device, whilesuppressing occurrence of a delay in response.

[0125] According to a fifteenth aspect, in the first aspect, the diskarray device further comprises:

[0126] a reassignment part, when a defect occurs in a recording area ofthe data block or redundant data in the m disk drives, executingreassign processing for assigning an alternate recording area to thedefective recording area.

[0127] According to a sixteenth aspect, in the fifteenth aspect, thedisk array device further comprises:

[0128] a first table storage part storing a first table in which anaddress of the alternate recording area previously reserved in each ofthe m disk drives can be registered as alternate recording areainformation; and

[0129] a second table storage part storing a second table in whichaddress information of the alternate recording area assigned to thedefective recording area can be registered, wherein

[0130] the reassignment part

[0131] when the second read requests are transmitted from the controlpart to the m disk drives, measures a delay time in each of the diskdrives;

[0132] determines whether each of the recording area of the data blocksor the redundant data to be read by each second read request isdefective or not based on the measured delay time;

[0133] when determining that the recording area is defective, assignsthe alternate recording area to the defective recording area based onthe alternate recording area information registered in the first tableof the first table storage part; and

[0134] registers the address information of the assigned alternaterecording area in the second table of the second table storage part,

[0135] the control part issues the second read requests based on theaddress information registered in the second table of the second tablestorage part, and

[0136] the delay time is a time period calculated from a predeterminedprocess start time.

[0137] According to a seventeenth aspect, in the sixteenth aspect,

[0138] the reassignment part assigns the alternate recording area to thedefective recording area only when determining successively apredetermined number of times that the recording area is defective.

[0139] In the seventeenth aspect, when determining successivelydetermines for a predetermined number of times that the recording areamay possibly be defective, the reassignment part assigns an alternaterecording area to that recording area. Therefore, if the reassignmentpart sporadically and wrongly determines that the recording area isdefective, the alternate recording area is not assigned to thatrecording area. Therefore, it is possible to provide the disk arraydevice which assigns an alternate recording area only to a trulydefective area.

[0140] According to an eighteenth aspect, in the sixteenth aspect,

[0141] the predetermined process start time is a time when each of thesecond read requests is transmitted to each of the m disk drives.

[0142] According to a nineteenth aspect, in the sixteenth aspect,

[0143] the predetermined process start time is a time when the m diskdrives start reading based on the second read requests.

[0144] In the eighteenth or nineteenth aspect, the reassignment part canrecognize the delay time correctly.

[0145] A twentieth aspect of the present invention is directed to a datainput/output method used for disk array device comprising a disk arrayconstructed of recording mediums for recording redundant data and anarray controller for controlling the disk array according to an accessrequest transmitted from a host device, the method comprising the stepsof:

[0146] generating by the array controller a read or write request to thedisk array with predetermined priority based on the received accessrequest;

[0147] enqueuing by the array controller the generated read or writerequest to a queue included therein according to the predeterminedpriority;

[0148] selecting by the array controller the read or write request to beprocessed by the disk array from among the read or write requestsenqueued to the queue according to the predetermined priority; and

[0149] processing by the disk array the selected read or write request.

[0150] In the twentieth aspect, the array controller converts thereceived access request to a read or write request with predeterminedpriority. The disk array processes the read or write request selected bythe array controller according to priority. Therefore, in the disk arraydevice including the disk array in which redundant data is recorded, itis possible to generate a read or write request with relatively highpriority for the access request required to be processed in real time,while a read or write request with relatively low priority for theaccess request not required to be processed in real time. Thus, the diskarray device can distinguish the access request from the host deviceaccording to the requirement of real-time processing. Consequently, theaccess request required to be processed in real time is processed in thedisk array device without being affected by the access request notrequired to be processed in real time.

[0151] According to a twenty-first aspect, in the twentieth aspect,

[0152] the array controller includes queues therein corresponding to thepriority; and

[0153] the generated read request or write request is enqueued to thequeue corresponding to the predetermined priority.

[0154] In the twenty-first aspect, since the queue is provided for eachlevel of priority, it is possible to distinguish the access request fromthe host device according to the requirement of real-time processing,and various processing in the disk array device is effectivelyprocessed.

[0155] According to a twenty-second aspect, in the twentieth aspect,

[0156] the array controller includes queues therein corresponding to thepredetermined priority for each of the recording mediums,

[0157] the array controller generates the read or write request with thepredetermined priority for each of the recording mediums based on thereceived access request, and

[0158] the array controller enqueues the read or write request generatedfor each of the recording mediums to the queue in the correspondingrecording medium according to the predetermined priority.

[0159] In the twenty-second aspect, since the queue is provided for eachrecording medium and each level of priority, it is possible todistinguish the access request from the host device for each recordingmedium according to the requirement of real-time processing, and variousprocessing in the disk array device is further effectively processed.

[0160] According to a twenty-third aspect, in the twentieth aspect,

[0161] the predetermined priority is set based on whether processing inthe disk array is executed in real time or not.

[0162] In the twenty-third aspect, the predetermined priority is setbased on the requirement of real-time processing. Consequently, theaccess request required to be processed in real time is processed in thedisk array device without being affected by the access request notrequired to be processed in real time.

[0163] According to a twenty-fourth aspect, in the twentieth aspect,

[0164] when an I/O interface is between the information recording deviceand the host device conforms to SCSI,

[0165] the predetermined priority is previously set in a LUN or LBAfield of the access request.

[0166] In the twenty-fourth aspect, the predetermined priority ispreviously set in the access request. Therefore, the host device cannotify the disk array device of the level of priority of the read orwrite request, that is, with how much priority the read or write requestis required to be processed.

[0167] A twenty-fifth aspect of the present invention is directed to adisk array device including a disk array constructed of recordingmediums for recording redundant data and controlling the disk arrayaccording to an access request transmitted from a host device,comprising:

[0168] a control part generating a read or write request to the diskarray with predetermined priority based on the received access request;

[0169] a queue managing part enqueuing the read request or write requestgenerated by the control part to a queue included therein according tothe predetermined priority; and

[0170] a selection part selecting the read or write request to beprocessed by the disk array from among the read or write requestsenqueued to the queue, wherein

[0171] the disk array processes the read request or write requestselected by the selection part.

[0172] In the twenty-fifth aspect, the received access request isconverted into a read or write request with predetermined priority. Thedisk array processes the read or write request selected by the selectionpart according to the level of priority. Therefore, in the disk arraydevice including the disk array in which redundant data is recorded, itis possible to generate a read or write request with relatively highpriority for the access request required to be processed in real time,while a read or write request with relatively low priority for theaccess request not required to be processed in real time. Thus, the diskarray device can distinguish the access request from the host deviceaccording to the requirement of real-time processing. Consequently, theaccess request required to be processed in real time is processed in thedisk array device without being affected by the access request notrequired to be processed in real time.

[0173] According to a twenty-sixth aspect, in the twenty-fifth aspect,

[0174] the queue managing part includes queues therein corresponding tothe priority, and

[0175] the read or write request generated by the control part isenqueued to the queue corresponding to the predetermined priority.

[0176] In the twenty-sixth aspect, since the queue is provided for eachlevel of priority, it is possible to distinguish the access request fromthe host device according to the requirement of real-time processing,and various processing in the disk array device is effectivelyprocessed.

[0177] According to a twenty-seventh aspect, in the twenty-fifth aspect,

[0178] the queue managing part includes queues therein corresponding tothe predetermined priority for each of the recording mediums.

[0179] the queue managing part generates the read or write request withthe predetermined priority for each of the recording mediums based onthe received access request; and

[0180] the queue managing part enqueues the read or write requestgenerated for each of the recording mediums to the queue in thecorresponding recording medium according to the predetermined priority.

[0181] In the twenty-seventh aspect, since the queue is provided foreach recording medium and each level of priority, it is possible todistinguish the access request from the host device for each recordingmedium according to the requirement of real-time processing, and variousprocessing in the disk array device is further effectively processed.

[0182] A twenty-eighth aspect of the present invention is directed to,in an information recording device comprising a disk array constructedof recording mediums for recording redundant data and an arraycontroller for controlling the disk array according to an access requesttransmitted from a host device, a data reconstruction method forrecovering data recorded on a faulty recording medium in the disk arrayand reconstructing the data, the method comprising the steps of:

[0183] generating by the array controller a read or write requestrequired for data reconstruction to the disk array with predeterminedpriority;

[0184] enqueuing by the array controller the generated read or writerequest to a queue included therein according to the predeterminedpriority;

[0185] selecting by the array controller the read or write request to beprocessed from among the read or write requests enqueued to the queueaccording to the predetermined priority;

[0186] processing by the disk array the selected read or write request;and

[0187] executing by the array controller data reconstruction based onprocessing results of the read or write request by the disk array.

[0188] In the twenty-eighth aspect, the array controller generates aread or write request for data reconstruction. The generated read orwrite request has predetermined priority. The disk array processes theread or write request selected by the array controller according to thelevel of priority. Therefore, when the disk array device which executesreconstruction processing provides relatively low priority for the reador write request for data reconstruction, the read or write request isprocessed without affecting other real-time processing. On the otherhand, when the disk array device provides relatively high priority, theread or write request is processed with priority, ensuring the end timeof data reconstruction.

[0189] According to a twenty-ninth aspect, in the twenty-eighth aspect,

[0190] the array controller includes queues therein corresponding to thepredetermined priority for each of the recording mediums,

[0191] the array controller generates the read or write request requiredfor data reconstruction with the predetermined priority for eachrecording medium, and

[0192] the array controller enqueues the generated read or write requestto the queue in the corresponding recording medium according to thepredetermined priority.

[0193] In the twenty-ninth aspect, since the queue is provided for eachrecording medium and each level of priority, and further, since thearray controller generates a read or write request with predeterminedpriority for each recording medium, it is possible to distinguish theaccess request from the host device for each recording medium accordingto the requirement of real-time processing, and various processing inthe disk array device is further effectively processed.

[0194] According to a thirtieth aspect, in the twenty-eighth aspect,

[0195] the read and write requests generated by the array controller aregiven lower priority to be processed in the disk array.

[0196] In the thirtieth aspect, since having relative lower priority,the read or write request is processed without affecting other real-timeprocessing.

[0197] According to a thirty-first aspect, in the twenty-eighth aspect,

[0198] the read and write requests generated by the array controller aregiven higher priority to be processed in the disk array.

[0199] In the thirty-first aspect, since having relatively higherpriority, the read or write request is processed with priority, ensuringthe end time of data reconstruction.

[0200] A thirty-second aspect of the present invention is directed to adata input/output method used in an information recording devicecomprising a disk array constructed of recording mediums for recordingredundant data and an array controller for controlling the disk arrayaccording to an access request transmitted from a host device,recovering the data recorded on the recording medium which has a failurein the disk array, and reconstructing the data in a spare recordingmedium;

[0201] when the access request for data to be reconstructed in the sparerecording medium is transmitted from the host device to the informationstorage device, the method comprising the steps of:

[0202] the array controller

[0203] reading data for recovery required for recovering the datarecorded in the failed recording medium from the disk array,

[0204] recovering data recorded in the failed recording medium byexecuting predetermined calculation with the data for recover read fromthe disk array

[0205] generating a write request with predetermined priority to writethe recovered data in the spare recording medium;

[0206] enqueuing the generated write request to a queue thereinaccording to the predetermined priority; and

[0207] selecting the generated write request as the write request to beprocessed by the disk array according to the predetermined priority, and

[0208] the disk array

[0209] processing the write request selected by the array controller,and writing the recovered data in the spare recording medium, wherein

[0210] the write request is given relatively lower priority.

[0211] In the thirty-second aspect, when the host device transmits anaccess request for data to be reconstructed in the spare recordingmedium, the array controller recovers the data to write in the sparerecording medium. Therefore, next time the disk array device executesdata reconstruction, it is not required to recover the data requested tobe accessed. The time required for data reconstruction is thusshortened.

[0212] A thirty-third aspect of the present invention is directed to adisk array device which reassigns an alternate recording area to adefective recording area of data, comprising:

[0213] a read/write control part for specifying a recording area ofdata, and producing an I/O request to request read or write operation;

[0214] a disk drive, when receiving the I/O request transmitted from theread/write control part, accessing to the recording area specified bythe I/O request to read or write the data; and

[0215] a reassignment part when receiving the I/O request transmittedfrom the read/write control part, calculating an elapsed time from apredetermined process start time as a delay time and determining whetherthe recording area specified by the I/O request is defective or notbased on the delay time; wherein

[0216] when determining that the recording area of the data isdefective, the reassignment part instructs the disk drive to assign thealternate recording area to the defective recording area.

[0217] In the thirty-third aspect, the reassignment part determineswhether the recording area of the data specified by the received I/Orequest is defective or not based on a delay time calculated from apredetermined process start time. The reassignment part can determinethe length of a delay in response from the disk drive based on the delaytime. When determining that the recording area is defective, thereassignment part instructs the disk drive to assign an alternaterecording area. That is, when the process time for one recording area inthe disk drive is long, the reassignment part determines that thatrecording area is defective, instructing the disk drive to performreassign processing. The disk array device thus suppress occurrence of along delay in response, allowing data input/out in real time.

[0218] According to a thirty-fourth aspect, in the thirty-third aspect,

[0219] the reassignment part assigns the alternate recording area to thedefective recording area only when determining successively apredetermined number of times that the recording area is defective.

[0220] In the thirty-fourth aspect, when the reassignment partdetermines successively for a predetermined number of times that onerecording area is defective, an alternate recording area is assigned tothat recording area. Therefore, the reassignment part can suppress asporadic determination error due to thermal aspiration in the disk driveand the like. Therefore, the reassignment part can instruct the diskdrive to assign an alternate recording area only to a truly defectivearea.

[0221] According to a thirty-fifth aspect, in the thirty-third aspect,

[0222] the predetermined process start time is a time when the I/Orequest is transmitted from the read/write control part.

[0223] According to a thirty-sixth aspect, in the thirty-third aspect,

[0224] the predetermined process start time is a time when the I/Orequest transmitted from the read/write control part is started to beprocessed in the disk drive.

[0225] In the thirty-fifth or thirty-sixth aspect, the predeterminedprocess time is the time when the I/O request is transmitted to the diskdrive or the time when the I/O request is started to be processed.Therefore, the reassignment part can recognize the delay time correctly.

[0226] According to a thirty-seventh aspect, in the thirty-third aspect,

[0227] the reassignment part further instructs the disk drive toterminate the read or write operation requested by the I/O request whenthe recording area of the data is defective.

[0228] In the thirty-seventh aspect, the reassignment part instructs thedisk drive to terminate processing of the I/O request specifying therecording area which is now determined to be defective. When thereassignment part determines that the recording area is defective, thedisk drive can terminate processing the I/O request for that defectivearea, suppressing occurrence of an additional delay in response.

[0229] A thirty-eighth aspect of the present invention is directed to Adisk array device which reassigns an alternate recording area to adefective recording area of data, comprising:

[0230] a read/write control part specifying a recording area of thedata, and producing an I/O request to request read or write operation;

[0231] a disk drive, when receiving the I/O request from the read/writecontrol part, accessing to the recording area specified by the I/Orequest to read or write the data; and

[0232] a reassignment part, when the recording area specified by the I/Orequest from the read/write control part is defective, instructing thedisk drive to reassign the alternate recording area to the defectiverecording area, wherein

[0233] when instructed to reassign by the reassignment part, the diskdrive assigns a recording area in which time required for the read orwrite operation is within a predetermined range, as the alternaterecording area.

[0234] In the thirty-eighth aspect, the disk drive takes the recordingarea in which the time required for read or write operation is within apredetermined range as the alternate recording area. Therefore, the diskarray device can suppress occurrence of a large delay in response,allowing input/output of data in real time.

[0235] According to a thirty-ninth aspect, in the thirty-eighth aspect,

[0236] the predetermined range is selected based on overhead in the diskarray device.

[0237] In the thirty-ninth aspect, the predetermined range is easilyselected based on overhead, which is a known parameter. Therefore, thedesign of the disk array device can be more simplified.

[0238] According to a fortieth aspect, in the thirty-eighth aspect,

[0239] when part or all of the recording areas of the data aredefective, the reassignment part assumes that the whole recording areasare defective.

[0240] In the fortieth aspect, in the disk array device, the alternaterecording area is assigned not by fixed-block unit, which is a managingunit in the disk drive. Therefore, the disk array device can preventdata fragmentation, suppressing occurrence of a large delay in responsemore.

[0241] According to a forty-first aspect, in the thirty-eighth aspect,

[0242] the reassignment part transmits a reassign block specifying alogical address block of the defective recording area to the disk drivefor reassignment; and

[0243] the disk drive assigns a physical address with which the timerequired for read or write operation is within the predetermined rangeto a logical address specified by the reassign block transmitted fromthe reassignment part as the alternate recording area.

[0244] In the forty-first aspect, the disk drive assigns a physicaladdress in which the time required for read or write operation is withina predetermined range as the alternate recording area to the physicaladdress on which reassign processing is to be performed. Therefore, thedisk array device can suppress occurrence of a large delay in response,allowing input/output of data in real time.

[0245] According to a forty-second aspect, in the thirty-eighth aspect,

[0246] when the read/write control part requests the disk drive to readthe data, and the recording area of the data is defective, the datarecorded in the defective recording area is recovered based onpredetermined parity and other data; and

[0247] the read/write control part specifies the assigned alternaterecording area, and requests the disk drive to write the recovered data.

[0248] According to a forty-third aspect, in the thirty-eighth aspect,

[0249] when the read/write control part requests the disk drive to writedata and the recording area of the data is defective,

[0250] the read/write control part specifies the assigned alternaterecording area, and the requests again the disk drive to write the data.

[0251] When the disk drive assigns an alternate recording area to onerecording area, the data recorded thereon might be impaired. Therefore,in the forty-second or forty-third aspect, the read/write control partrequests the disk array to write the data recovered based on the parityor other data, or specifies the alternate recording area to requestagain the disk array to write the data. Therefore, the disk array devicecan maintain consistency before and after assignment of the alternaterecording area.

[0252] A forty-fourth aspect of the present invention is directed to areassignment method of assigning an alternate area to a defectiverecording area of data; comprising the steps of:

[0253] transmitting an I/O request for requesting the disk drive to reador write operation by specifying a recording area of the data accordingto a request from outside; and

[0254] when the I/O request is transmitted in the transmission step,calculating an elapsed time from a predetermined time as a delay timeand determining whether the recording area specified by the I/O requestis defective or not based on the delay time; wherein

[0255] when the recording area is defective in the determination step,the disk drive is instructed to assign the alternate recording area tothe defective recording area.

[0256] A forty-fifth aspect of the present invention is directed to Areassignment method of assigning an alternate recording area to adefective recording area of data, comprising the steps of:

[0257] transmitting an I/O request for requesting the disk drive to reador write operation by specifying a recording area of the data accordingto a request from outside; and

[0258] when the recording area specified by the I/O request transmittedin the transmission step is defective, instructing the disk drive toassign the alternate recording area to the defective recording area,wherein

[0259] in the instructing step, the disk drive is instructed to assignthe recording area with which time required for read or write operationis within a predetermined range as the alternate recording area.

[0260] A forty-sixth aspect of the present invention is directed to adisk array device which assigns an alternate recording area to adefective recording area of data; comprising:

[0261] a read/write control part for transmitting an I/O request forrequesting read or write operation by specifying a recording area of thedata according to a request from outside;

[0262] a disk drive, when receiving the I/O request from the read/writecontrol part, accessing to the recording area specified by the I/Orequest and reading or writing the data;

[0263] a reassignment part, when receiving the I/O request from theread/write control part, calculating an elapsed time from apredetermined process start time as a delay time, and determiningwhether the recording area specified by the I/O request is defective ornot based on the delay time;

[0264] a first storage part storing an address of the alternaterecording area previously reserved in the disk drive as alternaterecording area information; and

[0265] a second storage part storing address information of thealternate recording area assigned to the defective recording area;wherein

[0266] when determining that the specified recording area is defective,the reassignment part assigns the alternate recording area to thedefective recording area based on the alternate recording areainformation stored in the first storage part, and stores the addressinformation on the assigned alternate recording area in the secondstorage part, and

[0267] the read/write control part generates the I/O request based onthe address information stored in the second storage part.

[0268] In the forty-sixth aspect, the reassignment part determineswhether the recording area is defective or not based on the delay timecalculated from a predetermined process start time. Therefore, when adelay in the response returned from the disk drive is large, thereassignment part determines that the recording area being accessed forreading is defective, assigning an alternate recording area. This allowsthe disk array device to input and output data in real time, whilesuppressing occurrence of a large delay in response.

[0269] According to a forty-seventh aspect, in the forty-sixth aspect,

[0270] the reassignment part assigns the alternate recording area to thedefective recording area only when determining successively apredetermined number of times that the recording area is defective.

[0271] According to a forty-eighth aspect, in the forty-sixth aspect,

[0272] the predetermined process start time is a time when the I/Orequest is transmitted from the read/write control part.

[0273] According to a forty-ninth aspect, in the forty-sixth aspect,

[0274] the predetermined process start time is a time when the I/Orequest transmitted from the read/write control part is started to beprocessed in the disk drive.

[0275] According to a fiftieth aspect, in the forty-sixth aspect,

[0276] the reassignment part further instructs the disk drive toterminate the read or write operation requested by the I/O request whendetecting that the recording area of the data is defective.

[0277] According to a fifty-first aspect, in the forty-sixth aspect,

[0278] the first storage part stores a recording area with whichoverhead in the disk drive is within a predetermined range as thealternate recording area.

[0279] In the fifty-first aspect, the first storage part manages thealternate recording areas in which the time required for read or writeoperation in the disk drive is within a predetermined range. Therefore,the data recorded on the alternate recording area assigned by thereassignment part is inputted/outputted always with a short delay inresponse. The disk array device thus can input and output data in realtime, while suppressing occurrence of a large delay in response.Furthermore, the predetermined range is easily selected based onoverhead, which is a known parameter. Therefore, the design of the diskarray device can be more simplified.

[0280] According to a fifty-second aspect, in the fifty-first aspect,

[0281] the first storage part further stores the alternate recordingarea by a unit of a size of the data requested by the I/O request.

[0282] In the fifty-second aspect, since the first storage part managesthe alternate recording areas in a unit of the requested data, thealternate recording area to be assigned is equal to the requested datain size. Therefore, the reassignment part can instruct reassignment withsimple processing of selecting an alternate recording area from thefirst storage part.

[0283] According to a fifty-third aspect, in the fifty-second aspect,

[0284] whether the overhead is within the predetermined range or not isdetermined for the recording areas other than the alternate recordingarea by the unit, and

[0285] the reassignment part assigns the alternate area to the recordingarea in which the overhead is not within the predetermined range.

[0286] In the fifty-third aspect, the reassignment part instructsassignment of an alternate recording area to the defective recordingarea at the timing other than that determined based on the delay time.The disk array device thus can input and output data more effectively inreal time, while suppressing occurrence of a large delay in response.Furthermore, the predetermined range is easily selected based onoverhead, which is a known parameter. Therefore, the design of the diskarray device can be more simplified.

[0287] According to a fifty-fourth aspect, in the forty-sixth aspect,

[0288] the address information stored in the second storage part isrecorded in the disk drive.

[0289] In the fifty-fourth aspect, with the address managing informationrecorded on the disk drive, the second storage part is not required tomanage the address information when the power to the disk array deviceis off. That is, the second storage part is not required to beconstructed by a non-volatile storage device, which is expensive, butcan be constructed by a volatile storage device at a low cost.

[0290] According to a fifty-fifth aspect, in the fifty-fourth aspect,the disk array device further comprises:

[0291] a non-volatile storage device storing an address of a recordingarea of the address information in the disk drive.

[0292] In the fifty-fifth aspect, since the non-volatile storage devicestores the address information, even when a defect occurs the storagearea of the address information in the disk drive, the addressinformation is -secured. It is thus possible to provide a disk arraydevice with a high level of security.

[0293] According to a fifty-sixth aspect, in the forty-sixth aspect, thedisk array device further comprises:

[0294] a plurality of disk drives including data recording disks deviceand a spare disk drive; and

[0295] a count part counting a used amount or remaining amount ofalternate recording area, wherein

[0296] the reassignment part determines whether to copy the datarecorded in the data recording disk drives to the spare disk drive basedon a count value in a count part, thereby allowing the spare disk driveto be used instead of the data recording disk drive.

[0297] In the fifty-sixth aspect, when there are shortages of alternaterecording areas in the disk drive for recording data, a spare disk driveis used. Therefore, there occurs no shortage of alternate recordingareas for reassignment at any time. The disk array device thus can inputand output data more effectively in real time, while suppressingoccurrence of a large delay in response.

[0298] A fifty-seventh aspect of the present invention is directed to areassignment method of assigning an alternate recording area to adefective recording area of data, comprising the steps of:

[0299] transmitting an I/O request for requesting read or writeoperation by specifying a recording area of the data; and

[0300] when the recording area specified by the I/O request transmittedin the transmission step is defective, assigning the alternate recordingarea to the defective recording area, wherein

[0301] in the assign step,

[0302] when the specified recording area is defective, the alternaterecording area is selected for the defective recording area by referringto alternate recording area information for managing an address of thealternate recording area previously reserved in the disk drive, theselected alternate recording area is assigned to the defective recordingarea, and further address information for managing an address of theassigned alternate recording area is created; and

[0303] in the transmission step, the I/O request is generated based onthe address information created in the assign step.

[0304] According to a fifty-eighth aspect, in the fifty-seventh aspect,

[0305] in the assign step, when the I/O request is transmitted, anelapsed time from a predetermined process start time is calculated as adelay time, and it is determined whether the recording area specified bythe I/O request is defective or not based on the delay time.

[0306] These and other objects, features, aspects and advantages of thepresent invention will become more apparent from the following detaileddescription of the present invention when taken in conjunction with theaccompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0307]FIG. 1 is a block diagram showing the structure of a disk arraydevice according to a first embodiment of the present invention;

[0308]FIG. 2 is a diagram showing the detailed structure of buffermemories 3A to 3D, 3P and 3R shown in FIG. 1;

[0309]FIGS. 3a and 3 b are conceptual diagrams showing parity groups;

[0310]FIGS. 4a and 4 b are flow charts showing the procedure executed bya controller 7 according to the first embodiment;

[0311]FIGS. 5a and 5 b are diagrams illustrating one technical effect ofthe disk array device shown in FIG. 1;

[0312]FIGS. 6a and 6 b are diagrams illustrating change in reading orderin disk drives 5A to 5D and 5P shown in FIG. 1;

[0313]FIGS. 7a and 7 b are diagrams illustrating another technicaleffect of the disk array device shown in FIG. 1;

[0314]FIGS. 8a and 8 b are flow charts illustrating the procedure of thecontroller 7 according to a second embodiment of the present invention;

[0315]FIG. 9 is a diagram showing an issue time table 71 in thecontroller 7;

[0316]FIGS. 10a and 10 b are diagrams illustrating one technical effectof the second embodiment;

[0317]FIG. 11 is a block diagram showing the structure of a disk arraydevice according to a third embodiment of the present invention;

[0318]FIGS. 12a and 12 b are flow charts showing the procedure of thecontroller 7 shown in FIG. 11;

[0319]FIGS. 13a and 13 b are diagrams illustrating a probabilitydistribution curve f(t) and a time margin t_(MARGIN);

[0320]FIG. 14a is a diagram illustrating a case in which four datablocks are stored in step S44 of FIG. 12;

[0321]FIG. 14b is a diagram illustrating a case in which a first timer72 is timed-out in step S45 of FIG. 12;

[0322]FIG. 15 is a block diagram showing the structure of a disk arraydevice according to a fourth embodiment of the present invention;

[0323]FIG. 16 is a flow chart to be executed by the controller 7 shownin FIG. 15 at reading processing;

[0324]FIG. 17 is a reservation table 73 to be created by the controller7 shown in FIG. 15 in a recording area therein;

[0325]FIG. 18 is a diagram illustrating a specific example of readingprocessing in the disk array device shown in FIG. 15;

[0326]FIG. 19 is a block diagram showing the structure of a disk arraydevice according to a fifth embodiment of the present invention;

[0327]FIG. 20 a conceptual diagram showing data blocks and redundantdata distributed across the disk drives 5A to 5D and 5P shown in FIG.19;

[0328]FIG. 21 is a flow chart showing the procedure of the controller 7shown in FIG. 19;

[0329]FIG. 22 is a diagram showing a faulty block table 75 to be createdby the controller 7 shown in FIG. 19 in a recording area therein;

[0330]FIGS. 23a and 23 b are diagrams illustrating one technical effectof the fifth embodiment;

[0331]FIG. 24 is a block diagram showing the structure of a disk arraydevice according to a sixth embodiment of the present invention;

[0332]FIG. 25 is a diagram showing a first table 91 being managed by afirst table storage part 9 shown in FIG. 24;

[0333]FIG. 26 is a flow chart illustrating the procedure of thecontroller 7 after the arrival of a first read request;

[0334]FIG. 27 is a diagram showing a second table 10 being managed by asecond table storage part 10 shown in FIG. 24;

[0335]FIG. 28 is a flow chart showing the procedure of the controller 7after the arrival of one read response;

[0336]FIG. 29 is a block diagram showing the detailed structure of SCSIinterfaces 4A to 4D and 4P shown in FIG. 24 and a reassignment part 8;

[0337]FIG. 30 is a flow chart showing the procedure of the reassignmentpart 8 after the arrival of a transmission notification;

[0338]FIG. 31 is a diagram illustrating a first list 82 and a secondlist 83 shown in FIG. 29;

[0339]FIG. 32 is a flow chart showing the procedure of reassignment tobe executed by the reassignment part 8 shown in FIG. 24;

[0340]FIG. 33 is a flow chart showing the procedure of the reassignmentpart 8 after the arrival of a receive notification;

[0341]FIG. 34 is a flow chart showing the procedure of the reassignmentpart 8 after the arrival of a read termination request;

[0342]FIG. 35 is a block diagram showing the structure of a disk arraydevice according to a seventh embodiment of the present invention;

[0343]FIG. 36 is a flow chart showing the procedure of the controller 7after the arrival of a first read request;

[0344]FIG. 37 is a flow chart showing the procedure of the controller 7after a REASSIGN-COMPLETED notification;

[0345]FIG. 38 is a flow chart showing the procedure of the controller 7after the arrival of a REASSIGN-COMPLETED notification;

[0346]FIG. 39 is a block diagram showing the structure of a disk arraydevice according to an eighth embodiment of the present invention;

[0347]FIG. 40 is a block diagram showing the detailed structure of aqueue managing part 34, a request selection part 35, and a diskinterface 36 shown in FIG. 39;

[0348]FIG. 41 is a diagram showing the detailed structure of a buffermanaging part 37 shown in FIG. 39;

[0349]FIG. 42a shows a data format of Identify;

[0350]FIG. 43b shows a data format of Simple_Queue_Tag;

[0351]FIG. 43a shows a data format of Read_(—)10;

[0352]FIG. 43b shows a data format of Write_(—)10

[0353]FIG. 44 is a flow chart showing operation of the disk array devicewhen a host device requests writing;

[0354]FIG. 45 is a diagram showing a format of a first process requestto be generated by a host interface 31;

[0355]FIG. 46 is a diagram showing a format of a first read request tobe generated by a controller 33;

[0356]FIG. 47 is a flow chart showing the operation of the disk arraydevice when the host device requests reading;

[0357]FIG. 48 is a flow chart showing the detailed procedure of stepS1713 shown in FIG. 47;

[0358]FIG. 49 is a diagram showing management tables 39A to 39D storedin a table storage part 39;

[0359]FIG. 50 is a diagram showing types of status to be set in themanagement tables 39A to 39D;

[0360]FIG. 51 is a flow chart showing the overall procedure of firstreconstruction processing;

[0361]FIG. 52 is a flow chart showing the detailed procedure of stepS194 shown in FIG. 51;

[0362]FIG. 53 is a flow chart showing the overall procedure of secondreconstruction processing;

[0363]FIG. 54 is a flow chart showing the detailed procedure of stepS212 shown in FIG. 53;

[0364]FIG. 55 is a block diagram showing the structure of a disk arraydevice 51 according to a ninth embodiment of the present invention;

[0365]FIG. 56 is a flow chart of operation of a read/write controller73;

[0366]FIG. 57 is a flow chart showing operation of a reassignment part75 when receiving a transmission notification;

[0367]FIG. 58 is a flow chart showing the procedure to be steadilyexecuted by the reassignment part 75;

[0368]FIG. 59 is a flow chart showing operation of the reassignment part75 when receiving a receive notification;

[0369]FIG. 60 is a diagram illustrating a first list 751 and a secondlist 752;

[0370]FIG. 61 is a diagram showing formats of REASSIGN BLOCKS;

[0371]FIG. 62 is a block diagram showing the structure of a disk arraydevice 91 according to a tenth embodiment of the present invention;

[0372]FIG. 63 is a diagram illustrating alternate area information 1109stored in a first storage part 1104;

[0373]FIG. 64 is a flow chart showing the procedure to be executed by aread/write controller 1102;

[0374]FIG. 65 is a diagram illustrating address information 11110 storedin a second storage part 1106;

[0375]FIG. 66 is a diagram illustrating the procedure to be steadilyexecuted by a reassignment part 1103;

[0376]FIG. 67 is a flow chart showing the procedure after step S2713shown in FIG. 66;

[0377]FIG. 68 is a diagram showing a counter included in a count part1105;

[0378]FIG. 69 is a diagram showing a conventional disk array deviceadopting the RAID-3 architecture;

[0379]FIGS. 70a and 70 b are diagrams illustrating a method of creatingredundant data in the conventional disk array device;

[0380]FIGS. 71a and 71 b are diagrams illustrating the problems in afirst disk array device disclosed in Japanese Patent Laying-Open No.2-81123; and

[0381]FIGS. 72a and 72 b are diagrams illustrating the problems in asecond disk array device disclosed in Japanese Patent Laying-Open No.9-69027.

DESCRIPTION OF THE PREFERRED EMBODIMENTS First Embodiment

[0382]FIG. 1 is a block diagram showing the structure of a disk arraydevice according to a first embodiment of the present invention. In FIG.1, the disk array device includes a host interface 1, a selector 2, sixbuffer memories 3A to 3D, 3P, and 3R, five SCSI interfaces 4A to 4D and4P, five disk drives 5A to 5D and 5P, a parity calculator 6, and acontroller 7. Note that the controller 7 includes an issue time table71, which is not used in the first embodiment but required in a secondembodiment and thus described later.

[0383]FIG. 2 shows a detailed structure of the buffer memories 3A to 3D,3P, and 3R in FIG. 1. In FIG. 2, the storage area of the buffer memory3A is divided into a plurality of buffer areas 3A₁, 3A₂, 3A₃ . . . Eachof the buffer areas 3A₁, 3A₂, 3A₃ . . . has a storage capacity (512bytes, in the first embodiment) for being able to store single datablock or redundant data. Further, an identifier (generally, a topaddress of each buffer area) for specifying each buffer area isallocated to each buffer area.

[0384] Each storage area of the other buffer memories 3B to 3D, 3P, and3R is also divided into a plurality of buffer areas. The identifier isalso allocated to each buffer area as in the same manner described forthe buffer area 3A₁.

[0385] Referring back to FIG. 1, a host device (not shown) is placedoutside the disk array device. The host device is connected so as tobi-directionally communicate with the disk array device. To write datainto the disk array device, the host device transmits a write requestand data of 2048 bytes to the disk array device. For easy understandingof the first embodiment, assume that the data to be transmitted from thehost device is 2048 bytes in size. The transmission data from the hostdevice is generated, typically, by dividing video data by 2048 bytes.

[0386] In response to the write request and data, the RAID starts writeoperation. Since described in detail in Background Art section, thiswrite operation is briefly described herein for the first embodimentwith reference to FIGS. 3a and 3 b. Assume that transmission data D-1(refer to FIG. 3a) is inputted from the host device through the hostinterface 1 to the selector 2 of the disk array device. The selector 2divides the data D-1 into four, generating data blocks D-A1, D-B1, D-C1,and D-D1 of 512 bytes each. The selector 2 transfers the data block D-A1to the buffer memory 3A, the data block D-B1 to the buffer memory 3B,the data block D-C1 to the buffer memory 3C, and the data block D-D1 tothe buffer memory 3D. The buffer memories 3A to 3D store the transferreddata blocks D-A1 to D-D1, respectively.

[0387] The data blocks D-A1 to D-D1 are also sent to the paritycalculator 6. The parity calculator 6 performs calculation of paritydescribed in Background Art section, generating redundant data D-P1 of512 bytes from the data blocks D-A1 to D-D1. The redundant data D-P1 istransferred to the buffer memory 3P, and stored therein.

[0388] Now, the buffer memories 3A to 3D store the data blocks D-A1 toD-D1, respectively, and the buffer memory 3P stores the redundant dataD-P1. These data blocks D-A1 to D-D1 and redundant data D-P1 aregenerated based on the same data D-1 of 2048 bytes, and therefore belongto the same parity group. As described in Background Art section, theparity group is a set of data blocks and redundant data generated basedon the same data (2048 bytes) from the host device. Assume herein thatthe data blocks D-A1 to D-D1 and redundant data D-P1 belong to a paritygroup n.

[0389] A write request is inputted through the host interface 1 to thecontroller 7. In response to the write request, the controller 7 assignsstorage locations for the currently-created parity group n. The storagelocations for the data blocks are selected from the storage areas in thedisk drives 5A to 5D, while the storage location for the redundant datais selected from the storage areas in the disk drive 5P. The controller7 notifies the SCSI interface 4A of the storage location selected fromthe storage areas in the disk drive 5A. Similarly, the controller 7notifies the SCSI interfaces 4B to 4D, and 4P of the storage locationsselected from the storage areas in the disk drives 5B to 5D and 5P,respectively.

[0390] In response to the notification from the controller 7, the SCSIinterface 4A fetches the data block D-A1 from the buffer memory 3Aconnected thereto, and stores the data block D-A1 in the selectedstorage area (location) in the disk drive 5A. Similarly, the other SCSIinterfaces 4B to 4D store the data blocks D-B1 to D-D1 of the buffermemories 3A to 3D in the selected storage areas (locations) in the diskdrives 5B to 5D, respectively. The SCSI interface 4P stores theredundant data D-P1 of the buffer memory 3P in the selected storage area(location) in the disk drive 5P.

[0391] In the disk array device, the above write operation is performedwhenever transmission data arrives from the host device. As a result, asshown in FIG. 3b, the data blocks and redundant data of the same paritygroup are stored in the disk drives 5A to 5D and 5P. For example, forthe parity group n (dotted part), the data blocks D-A1, D-B1, D-C1, andD-D1 and the redundant data D-P1 are generated. The data blocks D-A1,D-B1, D-C1, and D-D1 are stored in the disk drives 5A to 5D, while theredundant data is stored in the disk drive 5P. Also for other paritygroups, data blocks and redundant data are stored in the disk drives 5A,5B, 5C, 5D, and 5P, as the parity group n.

[0392] In the above write operation, the redundant data is stored onlyin the disk drive 5P, which is a fixed disk drive. As clear from above,the write operation is described based on the RAID-3 architecture.However, the disk array device according to the first embodiment is notrestricted to RAID-3, but may be constructed according to the RAID-5architecture. RAID-5 is different from RAID-3 in that redundant data isnot stored in a fixed disk drive, but distributed across disk drivesincluded in the disk array device.

[0393] To read data from the disk array device, the host devicetransmits a first read request to the disk array device. The first readrequest includes information specifying storage locations of the data.

[0394] In response to the first read request, the disk array devicestarts read operation that is distinctive of the present embodiment,which is now described in detail with reference to flow charts in FIGS.4a and 4 b.

[0395] The procedure to be executed by the controller 7 when the firstread request arrives is now described with reference to FIG. 4a. Thefirst read request arrives through the host interface 1 at thecontroller 7 (step S1). The controller 7 extracts the storage locationsof the data from the first read request. The controller 7 then specifiesthe storage location of the parity group generated based on the storagelocations of the data (four data blocks and its redundant data). Notethat the operation of specifying the storage location of the paritygroup from those of the data is known art, and is defined according tothe RAID architecture.

[0396] The controller 7 then issues a set of second read requests toread the parity group (step S2). Since the parity group is distributedover the disk drives 5A to 5D and 5P in the first embodiment, thecontroller 7 issues five second read requests. The second read requestsare respectively transmitted to the corresponding SCSI interfaces 4A to4D and 4P.

[0397] The second read request to the SCSI interface 4A specifies thestorage location of the data block in the disk drive 5A, and similarly,the second read requests to the SCSI interfaces 4B to 4D specify thestorage locations of the data blocks in the disk drive 5B to 5D,respectively. Further, the second read request to the SCSI interface 5Pspecifies the storage location of the redundant data in the disk drive5P.

[0398] The disk drive 5A receives the second read request through theSCSI interface 4A, and then reads the data block from the storagelocation specified by the second read request. The read data block istransmitted to the SCSI interface 4A. The second read request specifiesnot only the storage location of the disk drive 5A but that of thebuffer memory 3A. More specifically, the second read request specifiesthe buffer memory area (refer to FIG. 2) included in the buffer memory3A in which the read data block is to be stored. The SCSI interface 4Astores the data block read from the disk drive 5A in any one of thebuffer areas 3A₁, 3A₂, 3A₃ . . . specified by the second read request.After the data block of 512 bytes is stored in the buffer area 3A₁ (i isa natural number), the buffer memory 3A sends a “first READ-COMPLETED”to the controller 7 to notify that the read operation from the diskdrive 5A has been completed.

[0399] Similarly, the disk drives 5B to 5D each start reading the datablock in response to the second request sent through the correspondingSCSI interfaces 4B to 4D. The data blocks read from the disk drives 5Bto 5D are stored through the SCSI interfaces 4B to 4D in the bufferareas 3B_(i) to 3D_(i), respectively. Then, the buffer memories 3B to 3Deach transmit a first READ-COMPLETED to the controller 7 to notify thatthe read operation from the disk drives 5B to 5D has been completed.

[0400] Also, the disk drive 5P starts reading the redundant data afterreceiving the second read request from the SCSI interface 4P. The readredundant data is stored through the SCSI interface 4P in the bufferarea 3P_(i). After the redundant data is stored in the buffer area3P_(i), the buffer memory 3P transmits a first READ-COMPLETED to thecontroller 7 to notify that the read operation from the disk drive 5P iscompleted.

[0401] Note that, in most cases, the first READ-COMPLETED's from thebuffer memories 3A to 3D and 3P arrive at the controller 7 at differenttimes. For example, when reading from the disk drive 5A takes a longtime, the first READ-COMPLETED arrives at the controller 7 later thanthe signals from the other disk drives. As clear from the above, thefirst READ-COMPLETED's arrive at the controller 7 in the order in whichthe reading from the disk drives 5A to 5D and 5P has been completed.

[0402] Referring to FIG. 4b, described next is the procedure to beexecuted by the controller 7 after four first READ-COMPLETED's arrive.When receiving four first READ-COMPLETED's (step S11), the controller 7advances to step S12 without waiting for the remaining firstREAD-COMPLETED. That is, the controller 7 determines that reading fromany four of the disk drives 5A to 5D has been completed, and thatreading from the remaining disk drive is delayed.

[0403] The controller 7 then specifies the buffer memory (any one of thebuffer memories 3A to 3D and 3P) which has not yet sent a firstREAD-COMPLETED to distinguish the disk drive (any one of the disk drives5A to 5D and 5P) in which reading has not yet been completed. Thecontroller 7 issues a read-termination command to forcefully terminatethe reading being executed from the disk drive (step S12). Theread-termination command is sent to the disk drive which has notcompleted reading through the SCSI interface connected thereto, therebyterminating the reading.

[0404] After step S12, the controller 7 determines whether calculationof parity is required or not (step S13). At this time, the controller 7has received the first READ-COMPLETED's from four of the buffer memories3A to 3D, and 3P. Here, assume that the controller 7 has received thefirst READ-COMPLETED's from the buffer memories 3A to 3D. In this case,four data blocks are stored in the buffer memories 3A to 3D, andtherefore the controller 7 determines that the data requested from thehost device can be transmitted. Therefore, the controller 7 determinesthat calculation of parity is not required, and the procedure directlyadvances from step S13 to step S16.

[0405] Consider next a case where the controller 7 receives the firstREAD-COMPLETED from the buffer memory 3P. In this case, the redundantdata and three data blocks are stored in the disk drive 5P and three ofthe disk drive, but one data block has not yet been read. The controller7 therefore determines that the data required by the host device cannotbe transmitted until the unread data block is recovered. The controller7 then advances from step S13 to step S14, producing an recoveryinstruction to request the parity calculator 6 to operate calculation ofparity (step S14).

[0406] In response to the recovery instruction, the parity calculator 6fetches the redundant data and three data blocks from the buffer memoryarea 3P_(i) and three buffer memory areas (any of three buffer areas 3A₁to 3D_(i)) which store these data blocks. The parity calculator 6operates calculation of parity as described in Background Art section torecover the unread data block from the redundant data and three datablocks. The recovered data block is stored in a buffer memory area3R_(i) in the buffer memory 3R. When the calculation of parity ends, theparity calculator 6 issues a recovery-completed signal indicating end ofcalculation of parity, and transmits the to the controller 7. Whenreceiving the recovery-complete signal (step S15), the controller 7determines that four data blocks are stored in the buffer memory areasand that the data requested from the host device can be transmitted. Theprocedure then advances to step S16.

[0407] In step S16, the controller 7 generates a “secondREAD-COMPLETED”, and transmits the same to the selector 2. The secondREAD-COMPLETED specifies four buffer memory areas storing the datablocks. In response to the second READ-COMPLETED, the selector 2sequentially selects the specified buffer memory areas, and sequentiallyreads the four data blocks therefrom. The selector 2 further assemblesdata of 2048 bytes out of the read four data blocks. The assembled datais transmitted through the host interface 1 to the host device.

[0408] Described next is a specific example of the above described readprocessing of the disk array device of the present invention. Here,assume that the host device requests reading of data from the paritygroup n and then a parity group (n+1) as shown in FIG. 3b. FIG. 5a is aschematic diagram showing read timing of the parity groups n and (n+1)in a time axis.

[0409] The controller 7 first issues a set of second read requests toread the parity group n, and then another set of second read requests toread the parity group (n+1) (step S2 in FIG. 4a). As shown in FIG. 5a,as shown by dotted parts, the disk drive 5D first starts reading of thedata block. Then the disk drives 5C, 5A, 5P, and then 5B, in this order,start reading the data block or redundant data. Before the lapse of atime t₁, the disk drives 5C, 5A, and 5P have completed the reading. Thedisk drive 5B is the fourth which completes reading, at the time t₁.However, reading by the disk drive 5D is delayed, and being continuedafter the time t₁.

[0410] Therefore, immediately after the time t₁, four firstREAD-COMPLETED's from the buffer memories 3A, 3B, 3C, and 3P arrive atthe controller 7 (step S11 in FIG. 4b). The controller 7 issues aread-termination command to the disk drive 5D which does not completereading (step S12). In response to the read-termination command, thedisk drive 5D terminates the reading, as shown in FIG. 5a by X in solidlines.

[0411] The controller 7 then executes steps S13 to S16 of FIG. 4b, asdescribed above.

[0412] Referring back to FIG. 5a, at a time t₂ after the time t₁, thedisk drive 5D starts reading the data block of the parity group (n+1)(refer to a vertically-lined part). Before the time t₂, the disk drives5A, 5C, and 5P have already started reading. The disk drive 5B startsreading slightly after the time t₂. By a time t₃ after the time t₂, thedisk drives 5C, 5D, 5A, and 5P have completed reading. Therefore, thistime, the reading of the disk 5B is forcefully terminated by aread-termination command from the controller 7, as shown by X in brokenlines.

[0413] As evident from the above specific example, in the disk arraydevice of the present invention, when four data blocks are stored in thebuffer memory areas, the redundant data is not required. When three datablocks and redundant data are stored, the remaining one data block isnot required. The disk array device issues a read-termination command tothe disk drive which is reading the unnecessary data block to forcefullyterminate the reading (step S12 of FIG. 46), which is distinctive of thepresent disk array device.

[0414] To highlight the distinctive characteristics of the present diskarray device, described next is read operation by a disk array devicewhich does not execute step S12 of FIG. 4b (hereinafter referred to asno-termination disk array device), with reference to FIG. 5b. FIG. 5b isa schematic diagram showing read timing of the parity groups n and (n+1)in a time axis in the no-termination array disk device. The conditionsin FIG. 5b are the same as those in FIG. 5a except that theno-termination disk array device does not execute step S12 of FIG. 4b.The host device requests data reading from the parity group n, and thenthe parity group (n+1), under the same conditions as described above.

[0415] The controller 7 issues a set of second read requests in theorder in which the first read requests arrive to read data from theparity groups n and (n+1). As shown in FIG. 5b, like in FIG. 5a, readingof the data blocks or redundant data starts in the order as the diskdrives 5D, 5C, 5A, 5P, and 5B. The disk drives 5C, 5A, 5P, and 5B havecompleted reading by the time t₁, as is the same in the FIG. 5a, whilethe disk drive 5D continues reading. Without read-termination command,reading of the disk drive 5D is not forcefully terminated immediatelyafter the time t₁, ending at a time t₄ long after the time t₁. Note thatthe data of the parity group n can be transmitted to the host device atthe time t₁, as in FIG. 5a.

[0416] By the time t₄, the disk drives 5A, 5B, 5C, and 5P have alreadystarted reading of the data blocks and redundant data of the paritygroup (n+1). The disk drive 5D, however, starts reading of the datablock of the parity group (n+1) at a time t₅ after the time t₄. The diskdrives 5C, 5A, 5P have completed reading by the time t₆, and the diskdrive 5B completes reading at time t₆. Thus, the data of the paritygroup (n+1) is transmitted immediately after the time t₆.

[0417] In FIG. 5a and FIG. 5b, with three data blocks and the redundantdata at the time t₁, the data block stored in the disk drive 5D can berecovered, and thus the data of the parity group n can be transmitted tothe host device without requiring reading from the disk drive 5D.

[0418] Therefore, as shown in FIG. 5a, the disk array device of thepresent invention forcefully terminates reading from the disk drive 5Dimmediately after the time t₁, allowing the disk drive 5D to read thedata block of the parity group (n+1) in short order. On the other hand,as shown in FIG. 5b, the no-termination disk array device does notterminate unnecessary reading from the disk drive 5D after the time t₁until the time t₄. Due to this time for unnecessary reading, as shown inFIG. 5b, reading data of the parity group (n+1) is delayed.

[0419] As described above, the disk array device of the presentinvention terminates incomplete reading of the disk drive, allowing thedisk drive to start another reading in short order without continuingunnecessary reading. A reading delay does not affect subsequent reading.

[0420] Further, in FIG. 5a, since the disk drive 5D starts reading thedata block at time t₂, the disk array device can transmit the data ofthe parity group (n+1) to the host device immediately after the time t₃.Therefore, the disk array device can transmit the required two pieces ofdata (parity groups n and (n+1)) to the host device immediately afterthe time t₃. On the other hand, in FIG. 5b, the disk drive 5D startsreading as late as at the time t₅. This delayed reading affectssubsequent reading such that the no-termination disk array device cannottransmit the data of the parity group (n+1) at the time t₃, and thuscannot transmit the required two pieces of data (parity groups n and(n+1)) to the host device at the time t₃.

[0421] As clear from above, according to the disk array device of thepresent invention, the volume of data read from the whole the diskdrives 5A to 5P (so-called disk array) per unit of time increases.Therefore, the present disk array device can continuously transmit datato the host device. As a result, video data being replayed at the hostdevice less tends to be interrupted.

[0422] In some cases, a disk drive of a type shown in FIGS. 6a and 6 bare used for the disk drives 5A to 5D and 5P of the first embodiment.FIG. 6a shows physical recording positions of the data blocks orredundant data of the parity group n to (n+4) in any one of the diskdrives. In FIG. 6a, the data block or redundant data of the parity groupn is recorded on a track at the most inner radius of the disk. Further,the data block or redundant data of the parity group (n+2) is recordedon a track, then the parity groups (n+4), (n+1), and (n+3), in thedirection of the outer radius of the disk.

[0423] Consider that the controller 7 issues second read requests forreading the data block or redundant data to the disk drive of FIG. 6a inthe order as the parity groups n, (n+1), (n+2), (n+3), and (n+4). Thedisk drive of FIG. 6a executes reading so as to shorten a seek distanceof a read head without reading in the order in which the second readrequests arrive. For example, the disk drive changes the order ofreading so that the read head moves linearly from the inner to outerradius of the disk. As a result, the data blocks and redundant are readin the order as the parity groups n, (n+2), (n+4), (n+1), and (n+3). Thedisk drive thus can efficiently read more data blocks and redundant dataper unit of time.

[0424] Described next is reading processing of the present disk arraydevice when the above disk drive which changes the order of reading isused for all or part of the disk drives 5A to 5D and 5P shown in FIG. 1.Here, assume that the host device requests data reading in the order asthe parity groups n, (n+1), (n+2), (n+3), and (n+4) shown in FIG. 3b.FIG. 7 is a schematic diagram showing read timing of the parity groups nto (n+4) in a time axis in the disk array device of the presentinvention.

[0425] First, the controller 7 issues second read requests as in therequested order. Therefore, the second read requests arrive in each ofthe disk drives 5A to 5D and 5P in the order as the parity groups n,(n+1), (n+2), (n+3), and (n+4). The disk drives 5A to 5D and 5P,however, determine the order of reading independently, and thus theactual reading order in each disk drive is not necessarily be equal tothe requested order and may be different from one another. Furthermore,in FIG. 7a, the disk drives 5A, 5B, and 5P have completed reading thedata blocks and redundant data of the parity group (n+2) by a time t,and the disk drive 5D completes reading the data block of the sameparity group at the time t₇ (refer to hatched parts), while the diskdrive 5C completes reading the data block of the parity group (n+4) atthe time t, (refer to a horizontally-lined part). In this case, thecontroller 7 receives the fourth first READ-COMPLETED for the paritygroup (n+2) immediately after the time t, (step S11 of FIG. 4b).Therefore, a read termination command is sent to the disk drive 5C (stepS12), which therefore does not read the data block of the parity group(n+2).

[0426] Similarly, the disk drives 5A, 5B, 5C and 5P have completedreading of the data blocks and redundant data of the parity group (n+4)by a time t₈ (refer to vertically-lined parts). In this case, thecontroller 7 issues a read termination command for the parity group(n+4) immediately after the time t₈ to the disk drive 5D. The disk drive5D therefore does not read the data block of the parity group (n+4).

[0427] To highlight the distinctive characteristics of the present diskarray device, described next is read operation by a disk array devicewhich does not execute step S12 of FIG. 4b, with reference to FIG. 7b.FIG. 7b is a schematic diagram showing read timing of the parity groupsn to (n+4) in a time axis in the disk array device. The conditions inFIG. 7b is the same as those in FIG. 7a except that the disk arraydevice does not execute step S12 of FIG. 4b. The host device requestsdata reading from the parity groups n, (n+1), (n+2), (n+3) and then(n+4) sequentially in this order under the same conditions as describedabove.

[0428] The disk drives 5A to 5D and 5P determine the reading orderindependently from one another. In FIG. 7(b), as in FIG. 7(a), the diskdrive 5A, 5B, 5D and 5P have completed reading the data blocks andredundant data of the parity group (n+2) by the time t₇. The disk drive5C, however, has not yet started reading the data block of the paritygroup (n+2) by the time t₇. In the no-termination disk array device asshown in. FIG. 7b, the disk drive 5C is not provided with a readtermination command, and therefore will start reading the data block ofthe parity group (n+2) in the course of time. This reading, however, isnot necessary and a waste of time because the data block of the paritygroup (n+2) recorded in the disk drive 5C can be recovered at the timet₇.

[0429] Similarly, the disk drives 5A, 5B, 5C and 5P have completedreading the data blocks and redundant data of the parity group (n+4) bythe time t₈. The disk drive 5D, however, has not yet started reading thedata block of the parity group (n+4), and will start the reading in thecourse of time. This reading is also unnecessary and a waste of time.

[0430] As clear from the above, when a data block becomes in a state ofbeing recoverable, the disk array device of the present invention sendsa read termination command to the disk drive which has not yet startedreading the data block. In response to the read termination command, thedisk device will not start unnecessary reading, and but starts onlynecessary reading. Therefore, the present disk array device can quicklytransmit the requested data to the host device. In FIG. 7a, four piecesof data of the parity groups n, (n+2), (n+4), and (n+1) can betransmitted to the host device at a time t₉. On the other hand, in FIG.7b, with unnecessary reading by the disk drives 5C and 5D, only threepieces of data n, (n+2), and (n+4) can be transmitted at the time t₉.

[0431] As clear from above, according to the disk array device of thepresent invention, the volume of data to be read per unit of timeincreases, and data can be continuously transmitted to the host device.As a result, video data being replayed at the host device less tends tobe interrupted.

[0432] The disk drive shown in FIGS. 6a and 6 b does not process thesecond read requests in the arrival order but changes the reading order.In the disk drive, therefore, a plurality of second read requests maywait to be processed. Further, as evident from above, the controller 7may cancel the second read request which waits to be processed, butcannot terminate a specific second read request waiting to be processedin some cases. In this case, the controller 7 once terminates the entireprocessing of the second read requests in the disk drives, and thenissues new second read requests except the request to be terminated. Thecontroller 7 thus can cancel the specific second read request.

Second Embodiment

[0433] Described next is a disk array device according to a secondembodiment of the present invention. The configuration of the disk arraydevice is the same as that shown in FIG. 1. For clear understanding oftechnical effects of the second embodiment, any of the disk drives 5A to5D and 5P does not execute reading in the arrival order but changes thereading order so as to shorten the seek distance (the distance requiredfor seeking) of the read head as in FIG. 6b.

[0434] The disk array device of the second embodiment performs writeoperation as described in the first embodiment whenever transmissiondata from the host device arrives. To read data from the disk arraydevice, the host device transmits a first read request specifyingstorage locations of the data to the disk array device.

[0435] In response to the first read request, the disk array devicestarts read operation that is distinctive of the present embodiment,which is now described in detail with reference to flow charts in FIGS.8a and 8 b. Since the flow chart in FIG. 8a partially includes the samesteps as those in FIG. 4a, the steps in FIG. 8a are provided with thesame step numbers as those in FIG. 4a and their description issimplified herein.

[0436] In response to the first read request, the controller 7 issues aset of second read requests (steps S1 and S2). The controller 7 thencreates an issue time table 71 as shown in FIG. 9 in its storage area(step S21). As described in the first embodiment, the second readrequests sent to the SCSI interfaces 4A to 4D and 4P indicate the buffermemory areas 3A_(i) to 3D_(i) and 3P_(i) (refer to FIG. 2) in which thedata blocks or redundant data from the disk drives 5A to 5D and 5P areto be stored, respectively. The issue time table 71 includes the buffermemory areas 3A_(i) to 3D_(i) and 3P_(i) in which the data blocks andredundant data of the parity group to be read are stored, and also anissue time t_(ISSUE) when the controller 7 issued the second readrequests.

[0437] The controller 7 executes processing as described in the firstembodiment (refer to FIG. 4b) to transmit the data requested by the hostdevice. Since the processing when four first READ-COMPLETED's arrivedoes not directly relate to the subject of the second embodiment, itsdescription is omitted herein.

[0438] The controller 7 previously stores a limit time T_(LIMIT) bywhich four first READ-COMPLETED's have to have arrived from the issuetime t_(ISSUE). By the limit time T_(LIMIT), at least four disk drivesare supposed to have completed reading after the second read requestsare issued. If any two of the disk drives 5A to 5D and 5P have notcompleted reading by the limit time T_(LIMIT), transmission of the datarequested by the host device is delayed, causing interruption of thevideo being replayed at the host device.

[0439] As described in the first embodiment, the disk array device triesto read the data blocks and redundant data from the five disk drives 5Ato 5D and 5P. The disk array device, however, can transmit the datarequested to be read to the host device when four data blocks, or threedata blocks and the redundant data are stored in the buffer memories.Therefore, the data transmission to the host device is not delayed if atleast four disk drives have completed reading before the limit timeT_(LIMIT) elapses.

[0440] On the contrary, if two disk drives have not completed reading bythe limit time T_(LIMIT), the data transmission to the host device istotally delayed, and reading by the other three disk drives goes towaste. To avoid such waste of reading, the controller 7 executesprocessing according to a flow chart shown in FIG. 8b.

[0441] The controller 7 first determines whether four firstREAD-COMPLETED's have arrived by the limit time T_(LIMIT) (step S31). Instep 31, the controller 7 obtains a present time t_(PRE) from atime-of-day clock therein at predetermined timing, and selects the issuetime t_(ISSUE) in the issue time table 71 shown in FIG. 9. Thecontroller 7 previously stores the limit time T_(LIMIT) as describedabove. When (t_(PRE)−t_(ISSUE))>T_(LIMIT) is satisfied, the controller 7fetches the information on the buffer memory areas 3A_(i) to 3D_(i) and3P_(i) corresponding to the selected issue time t_(ISSUE) from the issuetime table 71 (refer to FIG. 9). As described above, each firstREAD-COMPLETED includes information on the buffer memory area in whichthe data block or redundant data is stored. When a first READ-COMPLETEDarrives, the controller 7 extracts the information on the buffer memoryareas included in the first READ-COMPLETED, and stores the same therein.

[0442] The controller 7 then compares the information on the buffermemory areas fetched from the issue time table 71 with the informationon the buffer memory area extracted from the first READ-COMPLETED whichhas arrived at the controller 7. The comparison results allow thecontroller 7 to determine whether four first READ-COMPLETED's havearrived by the limit time T_(LIMIT) or not.

[0443] In step S31, if four first READ-COMPLETED's have arrived by thelimit time T_(LIMIT), the controller 7 deletes the currently-selectedissue time table 71 (step S33), and ends the processing of FIG. 8b. Iffour READ-COMPLETED's have not yet arrived, the controller 7 specifiesone or more disk drives which have not completed reading (any of thedisk drives 5A to 5D and 5P) according to the comparison results. Thecontroller 7 issues a read termination command to terminate reading ofthe specified disk drives (step S32). In response to the readtermination command, the specified disk drives terminate the readingcurrently being executed or reading not yet executed. The controller 7then deletes the selected issue time table 71 (step S33), and ends theprocessing.

[0444] Described next is a specific example of read operation of thepresent disk array device with reference to FIG. 10a. Assume that thehost device requests data reading of the parity groups n, (n+1), andthen (n+2) as shown in FIG. 2b. FIG. 10a is a schematic diagram showingread timing of the parity groups n to (n+2) in a time axis in thepresent array disk device.

[0445] In response to a request from the host device, the controller 7issues a set of second read requests for reading data of the paritygroup n at time a time t₁₀ (refer to FIG. 10a). The controller 7 thencreates one issue time table 71 of FIG. 9 for read operation of theparity group n (step S21 in FIG. 8a). This issue time table 71 ishereinafter referred to as an issue time table 71 _(n), for conveniencein description. The issue time table 71 _(n) includes information on thebuffer memory areas 3A_(i), 3B_(i), 3C_(i), 3D_(i), and 3P_(i), and alsoincludes the time t₁₀ as the issue time t_(ISSUE). Similarly, secondread requests for reading data of the parity group (n+1), and then forthe parity group (n+2) are issued after the time t₁₀. The issue timetable 71 is created for each of the read operations of the parity groups(n+1) and (n+2).

[0446] The second read requests for the parity groups n, (n+1), and(n+2) are sent to each of the disk drives 5A to 5D and 5P. Each diskdrive determines its reading order independently. For example, the diskdrive 5A tries to read in the order as the parity groups n, (n+2), andthen (n+1); the disk drive 5B as (n+2), n, and then (n+1); the diskdrive 5C as (n+2), (n+1), and then n; the disk drive 5D as n, (n+2), andthen (n+1); and the disk drive 5P as n, (n+1), and then (n+2). Accordingto these reading orders, as shown in FIG. 10a, the disk drives 5A, 5Dand 5P first start reading the data blocks and redundant data of theparity group n (refer to dotted parts), while the disk drives 5B and 5Cstart reading the parity group (n+2) (refer to hatched parts).

[0447] Assume that a time t₁₁ equals to t₁₀+T_(LIMIT) and(t_(PRE)−t_(ISSUE))>T_(LIMIT) is satisfied. At the time t₁₁, thecontroller 7 fetches the information on the buffer memory areas 3A₁ to3D₁ and 3P₁ written with the issue time t_(ISSUE) (t₁₀) from the issuetime table 71 _(n) (refer to FIG. 9). By the time t₁₁, only the diskdrive 5D has completed reading of the data block of the parity group n,and therefore the controller 7 has received only the firstREAD-COMPLETED specifying the buffer memory area 3D₁ from the buffermemory 3D. The controller 7 thus recognizes that two or more firstREAD-COMPLETED's have not arrived by the limit time T_(LIMIT) and thatreading of the parity group n in the disk drives 5A to 5C and 5P has notyet be completed. The controller 7 thus specifies the disk drives (inthis case, the disk drives 5A to 5C and 5P) which are taking too muchtime to read the data of the parity group n.

[0448] The controller 7 issues a read termination command to thespecified disk drives 5A to 5C and 5P (step S32 of FIG. 8b) to terminatereading of the parity group n.

[0449] Accordingly, the disk drives 5A and 5P terminate reading of theparity group n, as shown by X in FIG. 10a immediately after the timet₁₁. As a result, the disk drive 5A starts reading of the parity group(n+2) (refer to a hatched part), while the disk drive 5P starts readingof the parity group (n+1) (refer to a vertically-lined part). Inresponse to the read termination commands, the disk drive 5B, which wassupposed to read the parity groups (n+2), n, and then (n+1), does notstart reading the parity group n, but reading the parity group (n+1)after completing reading of the parity group (n+2). Also the disk drive5C does not follow the predetermined reading order, not reading the datablock of the parity group n.

[0450] As described above, in some cases, the controller 7 of thepresent disk array device detects that two or more data blocks of thesame parity group, or at least one data block and the redundant data ofthe same parity group are not read within the limit time T_(LIMIT). Inthis case, the controller 7 specifies the disk drives which have not yetcompleted reading of the parity group. The controller 7 then issues aread termination command to the specified disk drives to terminatereading. This is the characteristic operation of the present disk arraydevice.

[0451] To highlight this distinctive characteristic of the present diskarray device, described next is read processing by a disk array devicewhich does not execute the flow chart of FIG. 8b, with reference to FIG.10b. FIG. 10b is a schematic diagram showing read timing of the paritygroups n to (n+2) in a time axis in the disk array device which does notexecute the flow chart of FIG. 8b. The conditions in FIG. 10b are thesame as those in FIG. 10a except that the disk array device does notexecute the flow chart of FIG. 8b. The host device requests reading ofthe parity groups n, (n+1), and then (n+2) sequentially in this orderunder the same conditions as described above.

[0452] The controller 7 issues a set of second read requests for readingthe parity group n at a time t₁₀ (refer to FIG. 10b). Similarly, thecontroller 7 issues second read requests for reading the parity group(n+1), and then (n+2) after the time t₁₀.

[0453] The disk drives 5A to 5D and 5P determine their reading orderindependently. Assume herein that the reading orders are the same asdescribed for the disk array device of the second embodiment. Accordingto these reading orders, as shown in FIG. 10b, the disk drives 5A to 5Dand 5P start reading the data blocks and redundant data of the paritygroups n, (n+1) and (n+2).

[0454] As described above, the disk array device does not execute theprocessing shown in FIG. 8b. Therefore, the disk drives 5A and 5P do notterminate read operation even though they take longer time than thelimit time t_(LIMIT) to read the parity group n. Furthermore, it ishighly possible that the data blocks of the parity group n stored in thedisk drives 5A and 5P may have a failure. Therefore, the disk arraydevice cannot assemble and transmit the data of the parity group n.Here, note that, despite that, the disk drives 5B and 5C startunnecessary reading of the data block of the parity group n.

[0455] As evident from FIGS. 10a and 10 b, with execution of theprocessing of FIG. 8b, on realizing that data being read cannot betransmitted to the host device, the disk array device of the secondembodiment terminates all reading of the parity group. Therefore, in thecase of FIG. 10a, the disk drives 5A, 5B, 5C, and 5P can start readingthe next parity group earlier than the case of FIG. 10b, therebyterminating unnecessary reading and quickly starting the next reading.Further, the disk drives 5B and 5C skip reading of the parity group dataof which cannot be transmitted to the host device, and start reading ofthe next parity group. As a result, the disk array device can read alarger volume of data per unit of time, and thus continuously transmitdata to the host device, allowing video data being replayed at the hostdevice to less tend to be interrupted.

Third Embodiment

[0456] In the previous embodiments, the controller 7 immediately issuesa recovery instruction to the parity calculator 6 after three datablocks and the redundant data are stored in the buffer memories.However, the calculation of parity requires a large amount of arithmeticoperation, and the more the number of operation of calculation ofparity, the more the disk array device is loaded. In a disk array deviceof a third embodiment, the controller 7 controls timing of issuing arecovery instruction to reduce the number of operation of calculation ofparity.

[0457]FIG. 11 is a block diagram showing the disk array device accordingto the third embodiment. The disk array device of FIG. 11 is differentfrom that of FIG. 1 in that the controller 7 includes a first timer 72.Since other structures are the same, the components in FIG. 11 areprovided with the same reference numerals as those of FIG. 1 and theirdescription is simplified herein.

[0458] The disk array device performs write operation as described inthe first embodiment whenever transmission data arrives from the hostdevice. To read data from the disk array device, the host devicetransmits a first read request specifying storage locations of the datato the disk array device.

[0459] In response to the first read request, the disk array devicestarts read operation that is distinctive of the third embodiment, whichis now described in detail with reference to flow charts of FIGS. 12aand 12 b. Note that since the flow chart of FIG. 12a is equal to that ofFIG. 8a, the steps in FIG. 12a are provided with the same step numbersas those in FIG. 8a. Through the execution of the flow chart of FIG.12a, the controller 7 issues a set of second read requests (requests forreading a parity group) (steps S1 and S2), and further creates the issuetime table 71 for the issued second read requests (step S21).

[0460] The second read requests issued by the processing of FIG. 12a aretransmitted to the disk drives 5A to 5D and 5P as described in the firstembodiment. In response to the second read request, each disk drivereads the data block or redundant data. The read data block andredundant data are stored through the SCSI interfaces 4A to 4D and 4P inthe buffer memories 3A to 3D and 3P. After storing, each buffer memorytransmits a first READ-COMPLETED to the controller 7 notifying thatreading has been completed.

[0461] If four first READ-COMPLETED's have arrived (step S11 of FIG.12b) by a time t_(4th), the controller 7 detects and stores the timet_(4th) (step S41). The controller 7 then determines whether reading ofthe redundant data has been completed or not (step S42).

[0462] If reading of the redundant data has not yet been completed (thatis, if the first READ-COMPLETED's from the buffer memories 3A to 3D havearrived), this reading is not necessary. The controller 7 thereforeissues a second read termination command to terminate the unnecessaryreading (step S12), and then issues a second READ-COMPLETED (step S16).In response to the second READ-COMPLETED, the selector 2 fetches thedata blocks from the buffer memories 3A to 3D to assemble the data to betransmitted to the host device. The selector 2 transmits the assembleddata through the host interface 1 to the host device.

[0463] In step S42, if the redundant data has been completely read (thatis, if the first READ-COMPLETED is received from the buffer memory 3P),the procedure advances to step S43, wherein the controller 7 calculatesa timeout value V_(TO1) to which a first timer 72 is to be set. Thetimeout value V_(TO1) is described in detail below.

[0464] Now, assume the following simulation is performed on the diskarray device. In this simulation, when second read requests are issuedmany times to one of the disk drives 5A to 5D and 5P from the controller7, the corresponding first READ-COMPLETED's arrive at the controller 7.A time t from issuance of the second read request to arrival of thecorresponding first READ-COMPLETED is measured in the simulation. Thetime t can be regarded as the time required for reading in one diskdrive. Since the time t measured varies within a certain deviation, aprobability distribution curve f(t) can be obtained as shown in FIG.13a. In FIG. 13a, the horizontal axis indicates the time t, while thevertical axis indicates the probability f(t) that the disk drive hascompleted reading by the time t.

[0465] Therefore, the probability P(t) that the first READ-COMPLETEDhave arrived by the time t after issuance of the second read request isgiven by

P(t)=∫₀ ¹ f(t)dt.

[0466] Since the present disk array device includes five disk drives,the probability P_(all)(t) that five first READ-COMPLETED's have arrivedby the time t after issuance of the second read requests of one paritygroup is given by

P _(all)(t)={P(t)}⁵.

[0467] Here, assuming that the time t when the probability P_(all)becomes predetermined probability P₀ is t₀, P_(all)(t₀)=P₀. Appropriatevalues are selected for t₀ and P₀ according to the design specificationof the disk array device so that the disk array device can ensuresuccessive data transmission to the host device. In order words, t₀ andP₀ are values that can ensure that video being replayed at the hostdevice is not interrupted.

[0468] As evident from above, in the present disk array device, it isexpected with the probability P₀ that reading of one parity group hasbeen completed by the time t₀ after issuance of the second read request.This time t₀ is hereinafter referred to as a completion-expectationvalue t₀. The controller 7 previously stores the completion-expectationvalue t₀ for calculating the timeout value V_(TO1).

[0469] When four first READ-COMPLETED's have arrived at the controller7, the progress of reading in the disk drives 5A to 5D and 5P is as suchin FIG. 13b, for example. In FIG. 13b, the second read requests issuedat the time t_(ISSUE) cause each disk drive to start reading. The diskdrives 5A, 5B, 5D, and 5P have completed reading by a time t_(4th).

[0470] Here, since reading of one parity group is expected to have beencompleted by the completion-expectation value t₀ with reference to thetime t_(ISSUE) with the probability P₀, reading of the disk drive 5C isexpected to have been completed by a time (t_(ISSUE)+t₀) as shown inFIGS. 13a and 13 b, with the probability P₀.

[0471] Therefore, the controller 7, in step S43, first fetches the timet_(4th) stored in step S41, the time t_(ISSUE) in the issue time table71, and the previously-stored completion-expectation value t₀. Then,{t₀−(t_(4th)−t_(ISSUE))} is calculated, resulting in a time margint_(MARGIN) as shown in a hollow double-headed arrow in FIG. 13b. Thecontroller 7 sets the first timer 72 to the calculated time margint_(MARGIN) as the timeout value V_(TO1) (step S43 in FIG. 12b). Thisactivates the first timer 72 to start countdown.

[0472] The controller 7 then determines whether the remaining firstREAD-COMPLETED arrives (step S44). In other words, the controller 7determines whether the remaining reading of the data block has beencompleted and four data blocks have been stored in the buffer memories.

[0473] With reference to FIG. 14a, if four data blocks have been stored,all data blocks of the disk drives 5A to 5D have been stored in thebuffer memories before the time margin T_(MARGIN) calculated based onthe time t_(4th) is consumed (that is, by the time (t_(ISSUE)+t₀)).Further, reading of the redundant data has also been completed.Therefore, the controller 7 is not required to issue a read terminationcommand, and the procedure directly advances from step S44 to step S16.In step S16, the controller 7 issues a second READ-COMPLETED. Inresponse to the second READ-COMPLETED, the selector 2 fetches the datablocks from the buffer memories 3A to 3D to assemble the data to betransmitted to the host device. The selector then transmits theassembled data through the host interface 1 to the host device. Thefirst timer 72 stops countdown, as required.

[0474] On the other hand, in step S44, when the remaining firstREAD-COMPLETED has not yet arrived, the controller 7 determines whetherthe first timer 72 is timed-out (step S45). In other words, thecontroller 7 determines whether the time margin T_(MARGIN) has elapsedfrom the time t_(4th).

[0475] When the first timer 72 is not timed-out, the procedure returnsto step S44, wherein the controller 7 determines again whether theremaining first READ-COMPLETED arrives.

[0476] On the other hand, when the first timer 72 is timed-out, thecontroller 7 recognizes that reading of the remaining one data block hasnot been completed after a lapse of the time margin t_(MARGIN) from thetime t_(4th) In FIG. 14b, the disk drive 5C is still reading the datablock. After a lapse of the time margin t_(MARGIN), the controller 7determines that the data cannot be continuously transmitted ifprocessing of the remaining first read request is waited more. Then, theprocedure advances from step S45 to step S14 of FIG. 12b, wherein thecontroller 7 issues a recovery instruction to the parity calculator 6immediately after the time (t_(ISSUE)+t₀) to request execution ofcalculation of parity. After ending calculation of parity, the paritycalculator 6 issues a RECOVERY-COMPLETED indicating that recovery hasbeen completed, and transmits the same to the controller 7. On receivingthe RECOVERY-COMPLETED (step S15), the controller 7 determines that fourdata blocks have been stored in the buffer memories and that the datarequested from the host device can be transmitted. The controller 7 thenissues a read termination command to terminate unnecessary reading inthe remaining disk drive (step S12). The controller 7 then issues asecond READ-COMPLETED (step S16). In response to the secondREAD-COMPLETED, the selector 2 fetches the data blocks from the buffermemories 3A to 3D to assemble the data to be transmitted to the hostdevice. The selector 2 transmits the assembled data to through the hostinterface 1 to the host device.

[0477] As described above, the disk array device of the third embodimentis different from that of the first embodiment in that an unread datablock is not recovered immediately after four first READ-COMPLETED'sarrive. In other words, the disk array device of the present embodimentwaits until reading of the remaining data block has been completedwithin the time margin T_(MARGIN) after four first READ-COMPLETED'sarrive. A recovery instruction is issued to the parity calculator 6 onlyafter a lapse of the time margin T_(MARGIN). When the remaining datablock is read within the time margin T_(MARGIN), four data blocks arestored in the buffer memories, which allows the disk array device totransmit data to the host device without operating calculation ofparity. Note that the time margin T_(MARGIN) is calculated, as describedabove with reference to FIG. 13a, based on the value t₀ which ensuresthat video being replayed at the host device is not interrupted.Furthermore, the time margin T_(MARGIN) indicates a time period withinwhich reading of the remaining data block is expected to have beencompleted. Therefore, in most cases, four data blocks are stored in thebuffer memories 3A to 3D within the time margin T_(MARGIN). The presentdisk array seldom requires calculation of parity, which requires a largeamount of arithmetic operation, minimizing the number of operation ofcalculation of parity.

[0478] Moreover, since a probability that the redundant data has not yetbeen read by the time when the fourth first READ-COMPLETED arrives is1/5, the present disk array device can quickly transmit data to the hostdevice without operating calculation of parity with the 1/5 probability.

Fourth Embodiment

[0479] The forgoing embodiments issue a recovery instruction withoutconsideration of the present state of the parity calculator 6.Therefore, the controller 7 may issue the next recovery instruction tothe parity calculator 6 while the parity calculator 6 is still operatingcalculation of parity. The parity calculator 6, however, can processonly one recovery instruction within a time period, and cannot receiveanother one. In a disk array device according to a fourth embodiment ofthe present invention, the controller 7 controls timing of issuingrecovery instructions so as not to issue a new recovery instructionduring operation of calculation of parity.

[0480]FIG. 15 is a block diagram showing the disk array device accordingto the fourth embodiment of the present invention. The disk array deviceof FIG. 15 is different from that of FIG. 1 in that the controller 7further includes a reservation table 73 and a second timer 74. Sinceother structures are the same, the components in FIG. 15 are providedwith the same reference numerals as those in FIG. 1 and theirdescription is simplified herein.

[0481] The disk array device of the fourth embodiment performs writeoperation as described in the first embodiment whenever transmissiondata from the host device arrives. To read data from the disk arraydevice, the host device transmits a first read request specifyingstorage locations of the data to the disk array device.

[0482] In response to the first read request, the disk array devicestarts read operation that is distinctive of the present embodiment,which is now described in detail with reference to the drawings.

[0483] As shown in FIG. 12a, the first read request causes thecontroller 7 to issue a set of second read requests (request for readinga parity group) (steps S1 and S2). Further, the issue time table 71 ofFIG. 9 is created for the issued second read requests (step S21).

[0484] The second read requests issued by the processing shown in FIG.12a is transmitted to the disk drives 5A to 5D and 5P, as described inthe first embodiment. In response to the second read request, each diskdrive reads the data block or redundant data. The read data blocks arestored through the SCSI interfaces 4A to 4D in the buffer memories 3A to3D, and the read redundant data is stored through the SCSI interface 4Pin the buffer memory 3P. After storing the data block or redundant data,each buffer memory transmits a first READ-COMPLETED to the controller 7to notify that reading of the corresponding disk drive is completed.

[0485] Further, the controller 7 regularly performs procedure shown in aflow chart of FIG. 16. Since the flow chart of FIG. 16 partiallyincludes the same steps as that of FIG. 12b, the same steps in FIG. 16are provided with the same step numbers as those in FIG. 12b, and theirdescription is omitted herein.

[0486] When four first READ-COMPLETED's arrive (step S11 of FIG. 16),the controller 7 stores the arrival time t_(4th) in the storage areathereof (step S41). The controller 7 then determines whether theredundant data has been read or not (step S42).

[0487] If the redundant data has not yet been read, as described in thefourth embodiment, the controller 7 terminates unnecessary reading inthe disk drive 5P (step S12), and then issues a second READ-COMPLETED(step S16). As a result, the data assembled by the selector 2 istransmitted through the host interface 1 to the host device.

[0488] Further, if the redundant data has already been read in step S42,the parity calculator 6 may operate calculation of parity. For thiscalculation of parity, the controller 7 writes necessary information inthe reservation table 73 (step S51). As shown in FIG. 17, a use timeperiod and buffer memory areas are written as the necessary informationin the reservation table 73. The use time period indicates that thecontroller 7 uses the parity calculator 6 during that period. The buffermemory areas indicate the storage locations of the data blocks andredundant data to be used by the parity calculator 6. The controller 7registers the information on the buffer memories included in the firstREAD-COMPLETED's obtained in step S11 in the reservation table 73 (stepS51).

[0489] In step S51, the start time and the end time of calculation ofparity are registered in the reservation table 73. The controller 7 thencalculates a timeout value V_(TO2) from a start time t_(s) ofcalculation of parity and the fourth arrival time (present time) t_(4th)by t_(4th)−t_(s). The controller 7 then sets the timer 74 to thecalculated timeout value V_(TO2) (step S52). This activates the timer 74to start countdown. When the timer 74 is timed-out, the paritycalculator 6 completes calculation of parity, capable of receiving thenext calculation of parity. That is, at that timeout, the controller 7can issue another recovery instruction.

[0490] The controller 7 next determines whether the remaining firstREAD-COMPLETED has arrived or not (step S44).

[0491] If the remaining first READ-COMPLETED has arrived, all four datablocks have been stored in the buffer memories before the timer 74 istimed-out. Therefore, calculation of parity is not required. The timeperiod for using the parity calculator 6 is, however, written in thereservation table 73. The controller 7 therefore deletes the informationon the use time period and the buffer memories registered in step S51(step S53).

[0492] Further, since reading of the redundant data has also beencompleted, the controller 7 is not required to issue a read terminationcommand. The controller 7 therefore issues a second READ-COMPLETED (stepS16). As a result, the data assembled by the selector 2 is transmittedthrough the host interface 1 to the host device. The timer 74 terminatescountdown as required.

[0493] If the remaining first READ-COMPLETED has not yet arrived in stepS44, the controller 7 determines whether the timer 74 is timed-out ornot (step S54). In other words, the controller 7 determines whether thetimeout value V_(TO2) has elapsed from the time t_(4th) or not.

[0494] When the timer 74 is not timed-out, the procedure returns back tostep S44, wherein the controller 7 determines again whether theremaining first READ-COMPLETED has arrived or not.

[0495] On the other hand, when the timer 74 is timed-out, the controller7 realizes that reading of the remaining data block has not beencompleted before the timeout value V_(TO2) has elapsed from the timet_(4th) and that the parity calculator 6 is now available. The procedureadvances from step S54 to step S12, wherein the controller 7 terminatesunnecessary reading in the remaining disk drive. Further, the controller7 issues a recovery instruction to request the parity calculator 6 tooperate calculation of parity (step S14). After calculation of parityends, the parity calculator 6 issues a RECOVERY-COMPLETED indicative ofending of calculation of parity, and transmits the same to thecontroller 7. When receiving the RECOVERY-COMPLETED (step S15), thecontroller 7 realizes that the information on the use time period andthe buffer memory areas registered in step S51 is no longer necessary.The controller 7 therefore deletes the unnecessary information from thereservation table 73 (step S53).

[0496] Moreover, on receiving the RECOVERY-COMPLETED, the controller 7determines that four data blocks have been stored in the buffer memoriesand that the data requested from the host device can be now transmitted.The controller 7 then issues a second READ-COMPLETED (step S16). As aresult, the data assembled by the selector 2 is transmitted through thehost interface 1 to the host device.

[0497] The general read operation of the present disk array device hasbeen described in the forgoing. Now described is a specific example ofthe read operation of the present disk array device with reference toFIGS. 16 and 18. Assume that the host device requests data reading inthe order as the parity groups n, (n+2), and then (n+4) of FIG. 3b. FIG.18 is a schematic diagram showing timing of reading the parity groups n,(n+2), and (n+4), and a reservation state of the parity calculator 6 ina time axis in the present disk array device.

[0498] The second read requests of the parity groups n, (n+2), and (n+4)are sent to each of the disk drives 5A to 5D and 5P. For simplifyingdescription, assume that each disk drive reads the parity group in theorder in which the second read requests arrive. Also assume that thereservation table 73 includes information that currently-operatedcalculation of parity will end at a time t₁₂ (refer to a lower-leftwardhatched part).

[0499] Under the above conditions, each disk drive first executesreading of the parity group n. In FIG. 18, the disk drive 5B completesreading at the time t₁₂, and therefore the fourth first READ-COMPLETEDarrives at the controller 7 at the time t₁₂ (step S11 of FIG. 16). Thecontroller 7 stores the time t₁₂ as the arrival time t_(4th) (step S41).Further, since the disk drive 5P has already completed reading of theredundant data, the controller 7 executes step S51 to register a timeperiod t₁₃ to t₁₄ as the use time period in the reservation table 73shown in FIG. 17. The controller 7 also registers 3A_(i), 3B_(i),3C_(i), and 3P_(i) as the buffer memory areas (step S51). The controller7 calculates a timeout value V_(TO2) (T₁=t₁₃−t₁₂), and sets the secondtimer 74 to the timeout value V_(TO2) (step S52).

[0500] At the time t₁₂, the disk drive 5D is still reading the datablock. However, assume that this reading will not have been completed bythe time t₁₃. In this case, when the timer 74 is timed-out, thecontroller 7 terminates the reading of the disk drive 5D, and issues arecovery instruction to the parity calculator 6 (steps S12 and S14). Theparity calculator 6 recovers the data block recorded in the disk drive5D between the time t₁₃ to t₁₄. Since a RECOVERY-COMPLETED from theparity calculator 6 arrives at the controller 7 at the time t₁₄ (stepS15), the controller 7 deletes the information on the use time periodt₁₃ to t₁₄ and the buffer memory areas 3A_(i), 3B_(i), 3C_(i), and3P_(i) from the reservation table 73 (step S53) The controller 7 thenissues a second READ-COMPLETED (step S16).

[0501] After completing reading of the parity group n, each disk drivestarts reading of the parity group (n+2). In FIG. 18, since a firstREAD-COMPLETED from the disk drive 5D arrives at the controller 7 at atime t₁₅, the controller 7 stores the time t₁₅ as the arrive timet_(4th) (steps S11 and S41). Furthermore, since the redundant data hasalready been read by the time t₁₅, the controller 7 writes the use timeperiod t₁₅ to t₁₈ and the identifiers of the buffer memory areas 3A_(i),3C_(i), 3D_(i), and 3Pi (step S51). Note that the time t₁₅ is after thetime t₁₄, and the parity calculator 6 is not performing calculation ofparity at that time t₁₅. The timeout value V_(TO2) is therefore “0”(step S52). The controller 7 immediately terminates currently-executingreading in the disk drive 5B, and then issues a recovery instruction tothe parity calculator 6 (steps S12 and S14). The following operation isevident from the above description and therefore its description isomitted herein.

[0502] After completing reading of the parity group (n+2), each diskdrive starts reading of the parity group (n+4). A first READ-COMPLETEDfrom the disk drive 5D arrives at the controller 7 at a time t₁₆ (beforethe time t₁₈). Since the redundant data has already been read by thetime t₁₆, the controller 7 writes the time period t₁₈ to t₁₉ as the usetime period in the reservation table 73. The controller 7 also writes3A_(i), 3C_(i), 3D_(i), and 3P_(i) as the identifiers of the buffermemory areas. Further, the controller 7 calculates a timeout valueV_(TO2) (T₂=t₁₈−t₁₆), and sets the timeout value V_(TO2) in the secondtimer 74 (step S52).

[0503] Note that, however, a first READ-COMPLETED from the disk drive 5Barrives at a time t₁₇ (before the time t₁₈) at the controller 7. Inother words, the first READ-COMPLETED arrives at the controller 7 beforethe timer 74 is timed-out. Therefore, the controller 7 does not issue arecovery instruction, and the parity calculator 7 does not operatecalculation of parity which was supposed to be executed between the timet₁₈ and t₁₆ (refer to X by dotted lines). The controller 7 then deletesthe use time period t₁₈ to t₁₉ and the identifiers of the buffer memoryareas 3A_(i), 3C_(i), 3D_(i), and 3P_(i) from the reservation table 73(step S53), and issues a second READ-COMPLETED (step S16).

[0504] As described above, the disk array device of the fourthembodiment is different from that of the first embodiment in that whenfour first READ-COMPLETED's arrive, the use time period of the paritycalculator 6 is written in the reservation table 73. As the use timeperiod, the time period after the calculation of parity being executedends is written therein. Since the controller 7 issues a recoveryinstruction during that time period, the controller 7 does not issue anyrecovery instruction during calculation of parity, thereby preventing anoverload on the disk array device.

[0505] Moreover, when the remaining data block arrives by the time thetimer 74 is timed-out, the controller 7 does not issue any recoveryinstruction but issues a second READ-COMPLETED to assemble the data fromthe four data blocks and transmit the same to the host device.Therefore, the disk array device can minimize the number of operation ofcalculation of parity which requires a large amount of arithmeticoperation.

Fifth Embodiment

[0506]FIG. 19 is a block diagram showing a disk array device accordingto a fifth embodiment of the present invention. The disk array device ofFIG. 19 is different from that of FIG. 1 in that the controller 7further includes a faulty block table 75. Since other structures are thesame, the components in FIG. 19 are provided with the same referencenumerals as those in FIG. 1 and their description is simplified herein.Note that the present disk array device does not always require theissue time table 71.

[0507] Also note that the data blocks and redundant data are stored inthe disk drives 5A to 5D and 5P not in the way as shown in FIGS. 3 a and3 b. The disk array device is constructed based on the level 5architecture. In the level-5 disk array device, the redundant data isnot stored in a fixed drive (refer to FIGS. 3a and 3 b), but distributedacross the disk drives 5A to 5D and 5P as shown in FIG. 20.

[0508] To read data from the disk array device, the host devicetransmits a first read request to the disk array device. The first readrequest specifies storage locations of the data.

[0509] In response to the first read request, the disk array devicestarts read operation that is distinctive of the present embodiment,which is now described in detail with reference to a flow chart in FIG.21. Since FIG. 21 partially includes the same steps as those in FIG. 2a,the same steps in FIG. 21 are provided with the same step numbers asthose in FIG. 2a and their description is simplified herein.

[0510] The first read request is sent to the controller 7 through thehost interface 1 (step S1). The controller 7 extracts the storagelocations of the data from the first read request. According to thestorage locations of the data, the controller 7 specifies the storagelocations of the parity group (four data blocks and redundant data)generated based on that data. Note that the processing of obtaining thestorage locations of the parity group from those of the data is knownart, and is defined according to the RAID architecture.

[0511] The controller 7 then determines whether any four of the diskdrives 5A to 5D and 5P have previously failed to read four data blocksto be read this time (step S61). For determination of step S61, thefaulty block table 75 is referred to. The storage locations of the datablocks failed to be read are listed in the faulty block table 75 asshown in FIG. 22. Alternatively, the storage locations of the datablocks which have been retried to be read or those which have beensuccessfully read but with more than a predetermined time periodrequired may be listed in the faulty block table 75.

[0512] If the four disk drives have not failed to read the four datablocks, the controller 7 determines that there is a low possibility offailing to read the four data blocks this time, and issues a set ofsecond read requests to read the parity group (step S62). In step S62,note that the second read requests are issued only to the four diskdrives in which the data blocks are recorded, but not to the remainingdisk drive in which the redundant data is recorded.

[0513] If the four disk drives have failed to read the four data blocks,the controller 7 determines that there is a high possibility of failingto read the four data blocks also this time, and issues a set of secondread requests to read the parity group (step S63). In step S63, notethat the second read requests are issued to the four disk drives inwhich the data blocks are recorded and the remaining disk drive in whichthe redundant data is recorded.

[0514] When first READ-COMPLETED's from the disk drives 5A to 5D and 5Parrive, the controller 7 performs operation as shown in FIG. 2b. Whenany data block is failed to be read during this operation, the storagelocation of that data block is added to the faulty block table 75.

[0515] As evident from the above, in the fifth embodiment, the number ofsecond read requests to be issued varies depending on the determinationresult in step S61. Such second read requests bring technical effects asshown in FIGS. 23a and 23 b. FIG. 23a shows a case in which, asdescribed in the previous embodiments, a set of five second readrequests are always issued, while FIG. 23b shows a case in which a setof four second read requests are issued for clarification of thetechnical effects of the present embodiment.

[0516] In FIG. 23a, the redundant data is read every time. Therefore,assuming a time required for reading one data block (or redundant data)is T, 5×T is required for reading the parity groups n to (n+4). In FIG.23b, however, the redundant data is not read. Therefore, while four diskdrives are reading one parity group, the remaining disk drive canexecute reading of another parity group. The present disk array devicethus may read the parity groups n to (n+4) in a shorter period of timethan the time period 5×T. FIG. 23b shows the fastest case, in which thedisk array device reads these parity groups in a time period 4×T.

[0517] As described above, in the present disk array device, theredundant data is read only when the data blocks which have been failedto be read is to be read this time. Therefore, as described withreference to FIGS. 23a and 23 b, the present disk array device can reada larger volume of data per unit of time. Furthermore, since theredundant data is read when there is a high possibility of failing toread the data blocks, the present disk array device can readily operatecalculation of parity when the reading is actually failed, and transmitdata to the host device as soon as possible.

Sixth Embodiment

[0518] One of the reasons why reading is delayed in any of the diskdrives 5A to 5D and 5P is that a defect occurs in a recording area ofthe disk drive. If the data block or redundant data is continuouslystored in such defective area, reading of the data block or redundantdata will be delayed every time. Therefore, in a sixth embodiment, thedisk array device for executing so-called reassign processing isrealized. Here, the reassign processing means that an alternaterecording area (hereinafter referred to as alternate recording area) isassigned to a defective recording area (hereinafter referred to asdefective area), and the data block or redundant data stored in thedefective area is stored again in the newly-assigned alternate area.

[0519]FIG. 24 is a block diagram showing the disk array device accordingto the sixth embodiment of the present invention. The disk array deviceis different from the disk array device of FIG. 1 in that a reassignmentpart 8, a first table storage part 9, a second table storage part 10,and an address conversion part 11 are further included. By adding thereassignment part 8, functions that are different from those in theprevious embodiments are added to the SCSI interfaces 4A to 4D and 4P.These new functions of the SCSI interfaces are not shown in FIG. 24 asspace does not allow detailed illustration, but shown later in FIG. 29.Other than that, the disk array device has the same structures as thoseof the first embodiment. Therefore, the components in FIG. 24 areprovided with the same reference numerals as those in FIG. 1 and theirdescription is simplified herein. Note that, even though not shown inFIG. 24, the first timer 72 as described in the third embodiment isincluded in the controller 7.

[0520] As known, each of the disk drives 5A to 5D and 5P manages its ownrecording area by sector unit of a predetermined size (512 bytes, in thepresent embodiment). A number called LBA is assigned to each sector. LBAis an acronym for Logical Block Address. At initialization of the diskarray device, part of the sectors in the recording areas of the diskdrives are allocated for the alternate areas. The first table storagepart 9 manages a first table 91 shown in FIG. 25 to manage suchalternate areas. In FIG. 25, the LBA's specifying the allocatedalternate areas are registered in the first table 91.

[0521] The host device (not shown) is placed outside the disk arraydevice and connected to the host interface 1, requesting the host deviceto write or read data. The RAID device performs the same write operationas described in the first and other embodiments. When the disk arraydevice is configured based on the RAID-3 architecture as shown in FIG.3, the redundant data is recorded only in the fixed disk drive 5P. Whenthe disk array device is configured based on the RAID-5 architecture asshown in FIG. 20, the redundant data is distributed across the diskdrives 5A to 5D and 5P. Note that the data blocks and redundant data arewritten in the areas other than the alternate areas when reassignment isnot performed.

[0522] The host device transmits a first read request to the RAID deviceto request reading data of a parity group, as described in the previousembodiments. To request reading of five parity groups n to (n+4) (referto FIGS. 3a and 3 b), the host device has to transmit five first readrequests to the RAID device. Each first read request includesinformation specifying the storage locations of the parity group to beread, as described above. In the sixth embodiment, the LBA's are usedfor the information specifying the storage locations.

[0523] In response to the first read request, the present disk arraydevice starts read operation that is distinctive of the sixthembodiment, which is now described with reference to FIG. 26. FIG. 26shows a flow chart showing the procedure of the controller 7 after thefirst read request arrives. Since the flow chart of FIG. 26 partiallyincludes the same steps as those of FIG. 12, the steps of FIG. 26 areprovided with the same step numbers as those of FIG. 12 and theirdescription is simplified herein.

[0524] A first read request arrives at the controller 7 through the hostinterface 1 (step S1 in FIG. 26). The controller 7 extracts the LBA's asinformation indicating the storage locations of the parity group to beread this time from the first read request. The controller 7 notifiesthe address conversion part 11 of the extracted LBA's (step S71). Theaddress conversion part 11 executes arithmetic operation defined byRAID-3 or RAID-5, drawing original LBA's of the data blocks andredundant data from the storage locations (LBA's) of the parity groupobtained from the controller 7. The original LBA's indicate the storagelocations on the disk drives 5A to 5D and 5P in which the data blocksand redundant data are stored by the disk array device upon the writerequest from the host device.

[0525] Described below is the arithmetic operation executed by theaddress conversion part 11. Since the present disk array device executesreassignment, the storage locations of the data block and redundant datamay change after reassignment. In the following description, a currentLBA indicates an LBA indicating a current storage location of the datablock or redundant data. First, when notified of the storage locationsof the parity group by the controller 7, the address conversion part 11accesses to the second table storage part 10 to specify the original LBAof the data block or redundant data. The second table storage part 10manages a second table 101 as shown in FIG. 27. In FIG. 27, the currentLBA of the data block or redundant data is registered with its originalLBA in the second table 101. Registration processing of the current LBAwill be described later.

[0526] When the current LBA is registered for the currently-drawnoriginal LBA, the address conversion part 11 extracts the current LBAfrom the second table 101. The address conversion part 11 determinesthat the data block or redundant data to be read is stored in therecording area indicated by the extracted current LBA. On the otherhand, when no current LBA is registered for the currently-drawn originalLBA, the address conversion part 11 determines that the data block orredundant data to be read is stored in the recording area indicated bythe original LBA. In this way, the address conversion part 11 specifiesthe LBA's indicating correct recording areas of the data blocks andredundant data to be read. The address conversion part 11 notifies thecontroller 7 of the specified LBA's.

[0527] The controller 7 issues a set of second read requests to read theparity group (four data blocks and redundant data) using the LBA's fromthe address conversion part 1 (step S2). In the present embodiment,since the parity group is distributed across five disk drives 5A to 5Dand 5P as shown in FIG. 3 or 20, five second read requests are issued.Each second read request includes, as described in the first embodiment,the LBA as the storage location of the data block or redundant data, andinformation on the buffer area (any of 3A_(i) to 3D_(i) and 3P_(i)) forstoring the read data block or redundant data. The second read requestsare transmitted to each of SCSI interfaces 4A to 4D and 4P.

[0528] When transmitting the second read requests to the SCSI interfaces4A to 4D and 4P, the controller 7 creates the issue time table 71 asshown in FIG. 9 (step S21). Since the processing of creating the issuetime table 71 has been described above, its description is omittedherein.

[0529] The SCSI interfaces 4A to 4D and 4P transmit the received secondread requests to the disk drives 5A to 5D and 5P, respectively. Inresponse to the second read requests, the disk drives 5A to 5D and 5Pstart reading of the data blocks and redundant data. However, readingwill be successfully completed, or eventually failed.

[0530] When reading has been successfully completed, the disk drives 5Ato 5D and 5P transmit the read data blocks and redundant data to theSCSI interfaces 4A to 4D and 4P. Further, each disk drive transmits anACK, a read response indicating that reading has been successfullycompleted, to its corresponding SCSI interface. On receiving the ACK,each SCSI interface identifies which second read request the receivedACK corresponds to, and stores the read data block or redundant data inthe corresponding one of the buffer areas 3A_(i) to 3D_(i) and 3P (referto FIG. 2) specified by the controller 7. Further, each SCSI interfacetransmits the received ACK to the controller 7.

[0531] On the other hand, when reading has been failed, the disk drives5A to 5D and 5P transmit a NAK, a read response indicating that readinghas been failed, to its corresponding SCSI interface. On receiving theNAK, each SCSI interface transmits the received NAK to the controller 7.

[0532] As evident from above, either one of the read responses, an ACKor a NAK is transmitted from each SCSI interface to the controller 7.Note that, in most cases, the read response from the SCSI interfaces 4Ato 4D and 4P arrive at different times. For example, when the disk drive5A takes much time to read the data block, the read response from theSCSI interface 4A arrives at the controller 7 later than other readresponses.

[0533] The controller 7 executes the procedure as shown in a flow chartof FIG. 28 whenever a read response arrives at the controller 7. Whenreceiving a read response (step S81), the controller 7 determineswhether the signal is an ACK or NAK (step S82). When it is a NAK, theprocedure advances to step S88, which will be described later. On theother hand, when it is an ACK, the controller 7 determines whether fourdata blocks of the same parity group have been stored in the bufferareas (step S83). More specifically, in step S83, it is determinedwhether the data block has been successfully read or not in each of thedisk drive 5A to 5D. In other words, the controller 7 determines whetherall ACK's from the SCSI interfaces 4A to 4D have been received.

[0534] When determining that four data blocks have been all stored, theprocedure advances to step S84, which will be described later. Whendetermining in step S83 that four data blocks have not been yet stored,the controller 7 determines whether the remaining data block can berecovered by calculation of parity or not (step S814). Morespecifically, in step S814, it is determined whether three data blocksand redundant data of the same parity group have been successfully reador not. In other words, it is determined whether the controller 7 hasreceived three ACK's from any three of the SCSI interfaces 4A to 4D andan ACK from the SCSI interface 4P.

[0535] When determining in step S814 that the remaining data blockcannot be recovered, that is, four ACK's have not been received duringexecution of step S814, the controller 7 temporarily terminates theprocedure shown in the flow chart of FIG. 28. The controller 7 thenwaits for a new read response from any of the SCSI interfaces 4A to 4Dand 4P.

[0536] When the procedure advances from step S83 to step S84, four datablocks of the same parity group have been stored in the buffer memories,as describe above. The disk array device of the third embodiment waitsuntil reading of the remaining data block is completed for a lapse ofthe time margin T_(MARGIN) from the time three data blocks and theredundant data are stored in the buffer memories (the time T_(4th)).Similarly, the disk array device according to the present embodimentwaits until reading of the remaining data block is completed even ifthree data blocks and the redundant data are stored in the buffermemories. Therefore, at the execution of step S84, four data blocks ofthe same parity group may be stored in the buffer memories 3A to 3D, orfour data blocks and the redundant data of the same parity group may bestored in the buffer memories 3A to 3D and 3P. The controller 7therefore determines whether reading of the redundant data has beencompleted or not (step S84). In other words, the controller 7 determineswhether it has received an ACK from the SCSI interface 4P.

[0537] When determining in step S84 that reading of the redundant datahas not yet been completed, the controller 7 generates a readtermination request and transmits the same to the reassignment part 8(step S85). The read termination request is now described. At the timeof step S84, since four data blocks have been stored, the data can beassembled without execution of calculation of parity. The controller 7therefore realizes that the redundant data being read is no longernecessary The read termination request transmitted in step S85 is asignal for requesting the reassignment part 8 to terminate reading ofsuch unnecessary redundant data. This read termination request includesinformation on the storage location (LBA) of the unnecessary redundantdata. In response to the read termination request, the reassignment part8 executes processing shown in a flow chart of FIG. 34, which willdescribed later. After the controller 7 ends the processing of step S85,the procedure advances to step S86.

[0538] On the other hand, when the controller 7 determines in step S84that the redundant data has been read, the procedure advances to stepS87. To advance to step S87, the procedure satisfies that four datablocks and the redundant data have been completely read. In other words,reading of the last data block is completed while the first timer 72 setin step S815 (described later) is active. Therefore, the first timer 72does not have to count down anymore. The controller 7 stops the activefirst timer 72 (step S87), and then the procedure advances to step S86.

[0539] In step S86, the controller 7 generates a READ-COMPLETED, andtransmits the same to the selector 2. The READ-COMPLETED is a signal fornotifying the selector 2 that four data blocks of the same parity grouphave been stored in the buffer memories 3A to 3D to allow dataassembling. The READ-COMPLETED includes information for specifying fourbuffer areas 3A_(i) to 3D_(i) in which the four data blocks of the sameparity group are stored. According to the received READ-COMPLETED, theselector 2 sequentially selects the four buffer areas 3A_(i) to 3D_(i)to read the four data blocks. The selector 2 further assembles the dataof 2048 bytes from the read four data blocks. The assembled data istransmitted through the host interface 1 to the host device.

[0540] When the procedure advances from step S814 to S815, three datablocks and redundant data of the same group have been stored in thebuffer memories, as described above. The disk array device according tothe present embodiment waits until reading of the remaining data blockhas been completed. Therefore, the controller 7 calculates a timeoutvalue V_(TO1), and sets the first timer 71 to the calculated timeoutvalue V_(TO1) (step S815). This activates the first timer 72 to startcountdown. The processing of step S815 is the same as that of S43 ofFIG. 12b, and therefore its description is omitted herein.

[0541] After the first timer 72 is set in step S815, the controller 7waits until a new read response from any of the SCSI interfaces 4A to 4Dand 4P arrives.

[0542] When the procedure advances from step S82 to S88, a NAK hasarrived at the controller 7. The controller 7 determines in step S88whether the first timer 72 is active or not. When determining that thefirst timer 72 is not active, the procedure advances to step S811, whichwill be described later. On the other hand, when determining that thefirst timer 72 is active, the NAK indicates that reading of theremaining data block which had not yet been completed in step S814 hasbeen eventually failed thereafter. The controller 7 realizes thatcountdown by the first timer 72 is no longer necessary, and stops thecountdown (step S89). The controller 7 also realizes that reading of theremaining data block has been failed and that the data block has to berecovered. The controller 7 thus issues a recovery instruction to theparity calculator 6 for operating calculation of parity (step S810). Theparity calculator 6 recovers the remaining unread data block, and storesthe same in the buffer memory 3P. The parity calculator 6 then issues aRECOVERY-COMPLETED, a signal indicating that recovery of the data blockhas been successfully completed, to the controller 7. In response to theRECOVERY-COMPLETED, the controller 7 issues a READ-COMPLETED to theselector 2 (step S86). As a result, the data is transmitted to the hostdevice.

[0543] When the procedure advances from step S88 to S811, three readresponses at the maximum have arrived. The disk array device of thepresent embodiment distributes the parity group across five disk drives5A to 5D and 5P. When reading of two of these disk drives are failed,data block recovery by calculation of parity cannot become expected.Therefore, the controller 7 determines in step S811 whether data blockrecovery by calculation of parity can be expected or not. Morespecifically, in step S811, it is determined whether two of the readresponses in the controller 7 are NAK's.

[0544] When determining in step S811 that data block recovery bycalculation of parity can be expected (that is, when determining for thefirst time that one of the read responses is a NAK), the controller 7temporarily ends the procedure shown in FIG. 28. The controller 7 thenwaits until a new read response from any of the SCSI interfaces 4A to 4Dand 4P arrives.

[0545] On the other hand, when the controller 7 determines in step S811that data block recovery by calculation of parity cannot be expected(that is, when it determines for a second time that the read response isa NAK), the procedure advances to step S812, wherein the controller 7issues a read termination request to the reassignment part 8. This readtermination request is now described. In step S812, some of the diskdrives 5A to 5D and 5P have not yet completed reading. For example, whenfirst and second read requests are both NAK's, three of the disk driveshave not completed reading. Since data block recovery cannot be expectedif two read response are NAK's, the controller 7 determines that thedata blocks or redundant data which have not yet been completely readare not necessary in step S812. Therefore, the controller 7 transmits aread termination request in step S812 for requesting the reassignmentpart 8 to terminate reading of such unnecessary data blocks or redundantdata. This read termination request includes information on the storagelocations (LBA) of the unnecessary data blocks or redundant data. Inresponse to the read termination request from the controller 7, thereassignment part 8 executes processing shown in a flow chart of FIG.34, which will described later. After the Controller 7 ends theprocessing of step S812, the procedure advances to step S813.

[0546] When the data block cannot be recovered, the data cannot betransmitted to the host device, and therefore the controller 7 generatesa READ-FAILED (step S813). The generated READ-FAILED is transmitted tothe host device.

[0547] When the first timer 72 is timed-out, the controller 7 executesthe procedure shown in FIG. 12b. Note that, since the procedure has beendescribed before, its description is omitted herein.

[0548] When issuing a set of second read requests, the controller 7subtracts the issue time t_(ISSUE) from the present time t_(PRE) byreferring to the issue time table 71. The controller 7 then determineswhether the calculated value (t_(PRE)−t_(ISSUE)) exceeds the limit timeT_(LIMIT). When two of the disk drives 5A to 5D and 5P have not yetcompleted reading by the time it is determined that the value exceedsthe limit time T_(LIMIT), the controller 7 specifies the disk drives inwhich reading has not yet been completed. The controller 7 then issues aread termination command to each of the specified disk drives. Notethat, since such procedure has been described with reference to FIG. 8b,its description is omitted herein.

[0549] Described next is operation of the reassignment part 8 withreference to FIGS. 29 to 34. As described above, the SCSI interfaces 4Ato 4D and 4P are additionally provided with new structure relating tothe reassignment part 8. The new structure includes, as shown in FIG.29, notifying parts 42A to 42D and 42P. When the SCSI interfaces 4A to4D and 4P transmit second read requests to the disk drives 5A to 5D and5P, respectively, each of the notifying parts 42A to 42D and 42Pgenerates a transmission notification indicating the transmission of thesecond read request. The generated notifications are transmitted to thereassignment part 8. Each notification includes an ID uniquelyspecifying the transmitted second read request, and the LBA specified bythe second read request. When the SCSI interfaces 4A to 4D and 4Preceive a read response (ACK or NAK) from the disk drives 5A to 5D and5P, respectively, each of the notifying parts 42A to 42D and 42P furthergenerates a receive notification indicating the receiving of the readresponse. The generated receive notifications are transmitted to thereassignment part 8. Each receive notification includes an ID uniquelyspecifying the second read request corresponding to the received readresponse, and the LBA specified by the second read request. Thereassignment part 8 can operate correctly, even if the LBA is notincluded in the receive notification.

[0550] Moreover, the reassignment part 8 includes, as shown in FIG. 29,a third timer 81 indicating the present time of day, a first list 82,and a second list 83, executing the procedure for reassignment shown ina flow chart of FIG. 30 whenever the reassignment part 8 receives atransmission notification. For specific description, assume herein thatthe reassignment part 8 receives a transmission notification from theSCSI interface 4A. The received transmission notification includes theID “b” and the LBA “a”.

[0551] The reassignment part 8 first detects a receive time whenreceiving the transmission notification based on the present timeindicated by the third timer 81. The reassignment part 8 uses thisreceive time as the time when the SCSI interface 4A transmits a secondread request to the disk drive 5A. Now assume that the time when thesecond read request is transmitted is t_(t1). The reassignment part 8extracts the ID “b” and the LBA “a” from the received transmissionnotification (step S91).

[0552] Now described below are the first list 82 and the second list 83.The first list 82 has, as shown in FIG. 31(a-1), fields in which the ID,LBA, and processing start time are registered. The first list 82 iscreated whenever a second read request is transmitted (that is, wheneverthe reassignment part 8 receives a transmission notification). Thereassignment part 8 classifies and manages the created first lists 82for each destination of the second read request. In other words, thefirst lists 82 are classified and managed for each of the disk drives 5Ato 5D and 5P (that is, SCSI interfaces 4A to 4D and 4P). Furthermore,the first lists 82 for each disk drive are sorted in the transmissionorder of the second read requests. Now assume that the plurality offirst lists 82 shown in FIG. 31(a-1) are created in response to thesecond read requests to be transmitted to the disk drive 5A. In FIG.31(a-1), as indicated by an arrow, the information on a new(later-transmitted) second read request is registered in the first list82 located frontward, while the information on an old(earlier-transmitted) second read request is registered in the firstlist 82 located backward.

[0553] The second list 83 has, as shown in FIG. 31(b-1), fields in whichthe LBA storing the data block or redundant data and a counter value Nare registered.

[0554] After step S91, the reassignment part 8 determines whether pluralsecond read requests are kept in the destination of the present secondread request (hereinafter referred to as present target disk drive)(step S92), which is now more specifically described. Here, the presenttarget disk drive is the disk drive 5A. As described above, the firstlist 82 is created whenever a second read request is transmitted to thedisk drives 5A to 5D and 5P, and the created first lists 82 are sortedand managed for each disk drive. Further, the first list 82 is deletedwhen the corresponding second read request has been completely processedor forcefully terminated in the disk drive. Therefore, the reassignmentpart 8 can know the number of second read requests kept in the presenttarget disk drive (disk drive 5A) if, for example, counting the numberof first lists 82 managed therefor. Note that, in step S92, thereassignment part 8 determines that plural second read requests are keptin the present target disk drive (disk drive 5A) if only one first list82 is managed, for the following reason: The first list 82 has not yetbeen created for the present second read request in step S91. Thereassignment part 82 manages only the first list(s) 81 for the secondread request transmitted to the disk drive 5A before step S91. In stepS92, however, the second read request(s) transmitted before step S91 andthe present second read request are kept in the present target diskdrive (disk drive 5A), and therefore the reassignment part 8 determinesthat plural second read requests are kept.

[0555] When determining in step S92 that plural second read requests arenot kept, the reassignment part 8 creates a new first list 82, andregisters the LBA “a” and ID “b” extracted in step 91 therein. Thereassignment part 8 also registers the transmission time t_(t1) detectedin step S91 as the process start time in that first list 82. Further,having received the transmission notification from the SCSI interface 4Ain step S91, the reassignment part 8 classifies the created first list82 as for the disk drive 5A and manages the same (step S93). As aresult, such information as shown in FIG. 31(a-2) is registered in thecreated first list 82.

[0556] On the other hand, when determining in step S92 that pluralsecond read requests are kept, the procedure advances to step S94. Thepresent second read request is not processed in the present target diskdrive until other previous read requests have completely been processed.In other words, the present second read request has to wait for beingprocessed in the present target disk drive. If the procedure advancesfrom step S92 to step S93, the transmission time t_(t1) detected in stepS91 is improperly set as the process start time in the first list 82.Therefore, the procedure advances from step S92 not to step S93 but tostep S94, in which the reassignment part 8 registers only the LBA “a”and the ID “b” extracted in step S91 in the first list 82 and managesthe same. Here, note that the process start time not registered in stepS94 will be registered later (refer to the following step S104 of FIG.32 for detail).

[0557] In addition to the procedure shown in FIG. 30, the reassignmentpart 8 executes another procedure shown in a flow chart of FIG. 32. FIG.32 shows processing of the reassignment part 8 for detecting a defectivearea. First, the reassignment part 8 refers to the first lists 82presently kept, and measures a delay time T_(D) of each second readrequest transmitted to each of the disk drives 5A to 5D and 5P. Thedelay time T_(D) indicates the time between a start of processing thesecond read request by each disk drive and the present time.

[0558] Measurement processing of the delay time T_(D) is now describedmore specifically. As evident from above, one first list 82 is createdwhenever the SCSI interface 4A transmits a second read request to thedisk drive 5A. This applies to the other disk drives 5B to 5D and 5P.Some of the first lists 82 include the process start time of the secondread request registered therein. The reassignment part 8 selects one ofthe first lists 82 with the process start time registered as the firstlist 82 to be processed. The reassignment part 8 then fetches theprocess start time from the selected first list 82. The reassignmentpart 8 also obtains the present time T_(P) from the timer 81. Thereassignment part 8 subtracts the extracted process start time from thepresent time T_(P). The subtraction result is used as the delay timeT_(D) of the second read request corresponding to the first list 82 tobe processed.

[0559] The reassignment part 8 previously stores the limit time T_(L)therein. The limit time T_(L) is a previously-determined indicator fordetermining whether each disk drive includes a defective area or not.The limit time T_(L) is preferably the time which ensures datatransmission without interruption of video and audio at the host device.The reassignment part 8 determines, whether the calculated delay timeT_(D) exceeds the limit time T_(L) or not (step S101 of FIG. 32). Whenthe delay time T_(D) exceeds the limit time T_(L), the reassignment part8 determines that the processing of the second read request specified bythe first list 82 to be processed is delayed, and that there is apossibility that the LBA specified by the second read request isdefective.

[0560] The processing in step S101 is now described more specifically.Assume that the reassignment part 8 selects the first list 82 shown inFIG. 31(a-2). This first list 82 includes the ID “b”, the LBA “a”, andprocess start time “t_(t1)” registered therein. Therefore, the delaytime T_(D) of the second read request specified by the ID “b” iscalculated by T_(P)-t_(t1). Further, the reassignment part 8 determineswhether T_(D)>T_(L) is satisfied. If not, the reassignment part 8selects another first list 82 for process, and executes step S101. Whennot being able to select another first list 82, the reassignment part 8ends the procedure of FIG. 32.

[0561] On the other hand, when T_(D)>T_(L) is satisfied in step S101,the reassignment part 8 instructs the SCSI interface 4 to terminate theprocessing of the second read request specified by the first list 82 tobe processed (step S102). In step S102, in order to terminate theprocessing of the second read request, the assignment part 8 generatesan ABORT_TAG message, one of the SCSI messages, and transmits the sameto the SCSI interface 4. The SCSI interface 4 transmits the ABORT_TAGmessage to the disk drive 5 connected thereto. In response to thereceived ABORT_TAG message, the disk drive 5 terminates the second readrequest specified by the ID “b”. Here, since the second read requestspecified by the ID “b” has been transmitted through the SCSI interface4A to the disk drive 5A, the reassignment part 8 transmits the ABORT_TAGmessage to the disk drive 5A through the SCSI interface 4A, causing thedisk drive 5A to terminate the processing of the second read requestspecified by the ID “b”.

[0562] After transmitting the ABORT_TAG message, the SCSI interface 4transmits a NAK indicating that the processing of the second readrequest specified by the ID “b” has been failed, to the controller 7.

[0563] After step S102, the reassignment part 8 determines the diskdrive 5 specified by the first list 82 to be processed. The reassignmentpart 8 determines whether plural second read requests are kept in thedetermined disk drive 5 to be processed (step S103).

[0564] When the reassignment part 8 determines in step S103 that pluralsecond read requests are kept, that is, plural first lists 82 aremanaged in the reassignment part 8, the procedure advances to step S104.Here, plural first lists 82 are managed for the disk drive 5A to beprocessed. Further, in step S108 or S1013 described later, the selectedfirst list 82 is deleted. Therefore, at this time, as shown in FIG.31(a-3), the reassignment part 8 manages the first list 82 to beprocessed and the first list 82 created next (hereinafter referred to as“next first list 82”) therein. The next first list 82 is shown assurrounded by a dotted line in FIG. 31(a-3). Note that the next firstlist 82 does not include the process start time registered, because itwas created in step S94 of FIG. 30. To register the process start time,the reassignment part 8 first obtains the present time T_(P) from thethird timer 81, and registers the present time T_(P) in the next firstlist 82 (step S104). The procedure then advances to step S105.

[0565] On the other hand, when the reassignment part 8 determines instep S103 that plural second read requests are not kept, the procedureskips step S104 to advance to step S105.

[0566] The reassignment part 8 then fetches the registered LBA from thefirst list 82 to be processed. The fetched LBA is hereinafter referredto as an LBA to be checked. Here, the LBA to be checked is “a”, and maypossibly be defective. The reassignment part 8 searches the second lists83 managed therein (refer to FIG. 31(b-1)) based on the LBA to bechecked to determined whether any second list 83 with the LBA to bechecked registered therein is present (step S105).

[0567] As described above, the second list 83 includes the fields forregistering the LBA and the counter value N therein. The counter value Nindicates how many times the LBA to be checked has successivelysatisfied T_(D)>T_(L) in step S101. Therefore, if any second list 83with the LBA to be checked registered therein is found in step S105, theLBA to be checked is determined to be possibly defective also atprevious check. That is, the second read request for reading the datablock or redundant data from the LBA to be checked has been transmittedsuccessively at least twice (at previous time and this time) by now.Moreover, the reassignment part 8 has successively determined that theLBA to be checked satisfies T_(D)>T_(P) twice in step S101 executed inresponse to each second read request. On the other hand, when any secondlist 83 with the LBA to be checked registered therein cannot be found,the LBA to be checked is determined for the first time to possibly bedefective.

[0568] When the second list 83 with the LBA to be checked registeredtherein can be found in step S105, the procedure advances to step S109.Otherwise, the procedure advances to step S106, wherein a new secondlist 83 is created. As shown in FIG. 31(b-2), the reassignment part 8registers the LBA to be checked (“a”, in this example) in the LBA fieldof the created second list 83. The reassignment part 8 also registers adefault value “1” in the counter field thereof (step S106).

[0569] After step S106, the reassignment part 8 determines whether thecounter value N in the second list 83 with the LBA to be checkedregistered therein (hereinafter referred to as the second list 83 to beprocessed) reaches a limit value N_(L) or not (step S107). The limitvalue N_(L) is a predetermined threshold for determining whether the LBAto be checked is defective or not. The limit value N_(L) is a naturalnumber of 1 or more, determined according to the specifications of thepresent disk array device. In the present embodiment, assume that ”2” isselected for the limit value N_(L). Since the second list 83 to beprocessed is the newly-created one in step S106, the counter value N “1”is registered in the second list 83 to be processed (refer to FIG.31(b-2)). The reassignment 8 therefore determines that the counter valueN does not reach the limit value N_(L), and the procedure advances tostep S108.

[0570] The reassignment part 8 then determines that the first list 82 tobe processed is no longer necessary, and deletes the first list 82 (stepS108). This processing prevents the first list 82 from being redundantlyselected for process. Here, the reassignment part 8 deletes the firstlist 82 with the ID “b”, the LBA “a”, and the process start time t_(t1)”registered therein. Note that the second list 83 to be processed is notdeleted in step S108. After step S108, the procedure returns to stepS101, wherein the reassignment part 8 selects another first list 82 tobe processed to continue the procedure. When the counter value N reachesthe limit value N_(L) in step S106, the procedure advances to step S109.

[0571] Furthermore, another first read request may arrive at thecontroller 7 from the host device. In response to the other first readrequest, the controller 7 transmits a set of second read requests to theSCSI interfaces 4A to 4D and 4P. The SCSI interfaces 4A to 4D and 4Ptransmit the received second read requests to the disk drives 5A to 5Dand 5P, respectively. Assume that the second read request transmitted tothe disk drive 5A indicates reading the data block from the LBA “a”. Inthis case, the notifying part 42A of the SCSI interface 4A generates atransmission notification for the second read request transmitted to thedisk drive 5A, and transmits the notification to the reassignment part8. Here, assume that this transmission notification includes the ID “c”and the LBA “a”.

[0572] On receiving the transmission notification, the reassignment part8 starts the procedure as shown in FIG. 30, first obtaining the presenttime T_(P) from the third timer 81. The present time T_(P) is used, asdescribed above, as the time when the SCSI interface 4A transmits thesecond read request to the disk drive 5A. Here, assume that thetransmission time of the second read request is t_(t2) The reassignmentpart 8 extracts ID “c” and the LBA “a” from the received transmissionnotification (step S91). The reassignment part 8 then executes steps S92and then S93, or steps S92 and then S94 to create a new first list 82for the present second read request, and then ends the procedure of FIG.30. Assuming that the present target disk drive (disk drive 5A) keepsonly one second read request, the first list 82 includes the LBA “a”,the ID “c”, and the process start time “t_(t2), registered therein(refer to FIG. 31(a-4)).

[0573] The reassignment part 8 further executes the procedure of FIG.32. The reassignment part 8 first selects the first list 82 to beprocessed from the first lists 81 stored therein. The reassignment part8 then determines whether the delay time T_(D) calculated by referringto the first list 82 to be processed exceeds the limit time T_(L) (stepS101). Here, assume that the first list 82 to be processed is as shownin FIG. 31(a-4). In this case, the delay time T_(D) can be obtained byT_(P)−t_(t2) When T_(D) (=T_(P)−t_(t2))>T_(L) is satisfied, thereassignment part 8 terminates processing of the second read requestspecified by the first list 82 to be processed (step S102), and thendetermines whether another first list 82 is managed therein for thetarget disk drive (disk drive 5A) (step S103). Here, since the presenttarget disk drive (disk drive 5A) keeps one second read request, theprocedure directly advances from step S103 to step S105. Thereassignment part 8 then fetches the LBA in the first list 82 to beprocessed as the LBA to be checked (“a” at present). The reassignmentpart 8 then searches the managed second lists 83 based on the LBA to bechecked to determine whether any second list 83 with the LBA to bechecked registered therein is present (step S105).

[0574] As described above, since the reassignment part 8 manages thesecond list 83 as shown in FIG. 31(b-2), the procedure advances to stepS109. Here, the second list 83 with the LBA to be checked registeredtherein is to be processed by the reassignment part 8, as describedabove.

[0575] The reassignment part 8 increments the counter value N registeredin the second list 83 to be processed by “1” (step S109). Here, thecounter value N in FIG. 31(b-2) is incremented by “1”, resulting in “2”as shown in FIG. 31(b-3). After step S109, the reassignment part 8determines whether the counter value N reaches the limit value N_(L)(“2”, as described above) or not (step S107). Since the counter value Nis “2”, the reassignment part 8 assumes that recording area specified bythe LBA to be checked (the LBA “a” of the disk drive 5A, at present) isdefective, and the procedure advances to step S1010.

[0576] The reassignment part 8 accesses to the first table 91 (refer toFIG. 25) managed by the first table storage part 9, selecting one of theLBA's specifying currently available alternate areas. The reassignmentpart 8 thus selects the alternate area to be assigned to the defectivearea (step S1010). The size of the selected alternate area is equal tothat of the data block or redundant data (512 bytes, in the presentembodiment).

[0577] The reassignment part 8 notifies the address conversion part 11of the LBA of the defective area (the LBA “a” of the disk drive 5A, atpresent) and the LBA of the selected alternate area (step S1011). Theaddress conversion part 11 registers the LBA's of the defective andalternate areas received from the reassignment part 8 in the secondtable 101 (refer to FIG. 27) managed by the second table storage part10. Note that, in FIG. 27, the LBA of the defective area specifies theoriginal storage location of the data block or redundant data, and istherefore described as the original LBA in the second table.Furthermore, the LBA of the alternate area specifies the currentrecording area of the data block or redundant data previously recordedin the defective area, and is therefore described as the current LBA.With the address information thus updated, the controller 7 uses thecurrent LBA when the controller 7 next generates a second read requestfor reading the reassigned data block or redundant data.

[0578] After step S1011, the reassignment part 8 updates the first table91 in the first table storage part 9 so as not to redundantly select thealternate area selected in step S1010 (step S1012). This updatingprevents the reassignment part 8 from redundantly selecting the presentalternate area, and ends the reassign processing. After thereassignment, the first list 82 and second list 83 to be processed arenot necessary any more, and therefore the reassignment part 8 deletesthese lists (step S1013). Furthermore, the reassignment part 8 generatesa REASSIGN-COMPLETED notification, a signal indicating that the reassignprocessing ends, and transmits the same to the controller 7 (stepS1014). The REASSIGN-COMPLETED notification includes information on theLBA's of the defective area and alternate area.

[0579] In response to the REASSIGN-COMPLETE notification from thereassignment part 8, the controller 7 recovers the unread data block orredundant data by reassignment according to the architecture of the RAIDlevel adopted in the present embodiment, and then writes the recovereddata block and redundant data in the alternate area of the disk drive(on which the reassignment has been executed) of the present target diskdrive. Since this processing is known art, its description is omittedherein. With this writing of the data block and redundant data, theparity group recorded over the disk drives 5A to 5D and 5P can maintainconsistency before and after reassignment.

[0580] As described above, in the disk array device according to thepresent embodiment, reassign processing is executed when a defectivearea is detected on any of the disk drives 5A to 5D and 5P. As a result,an alternate area is assigned to the defective area. The unread datablock or redundant data is stored in the alternate area. In other words,the data block or redundant data is not left in the defective area.Therefore, after detection of a defective area, the disk array deviceaccesses not to the defective area but to the alternate area, attemptingto read the data block or redundant data. Consequently, delay of readingdue to continuous access to the defective area as described at theoutset of the present embodiment can be prevented.

[0581] In the present embodiment, to clarify the timing of assigning analternate area, operation when a read response is received by each ofthe SCSI interfaces 4A to 4D and 4P has been described, with part of theoperation omitted. When a read response is returned to each SCSIinterface, the contents of the first list 82 is changed according to thetime when the read response returned and the like. Described next isoperation of updating the first list 82 when a read response isreturned.

[0582] The notifying parts 42A to 42D and 42P generate a receivenotification signal whenever the SCSI interfaces 4A to 4D and 4P receivea read response from the disk drives 5A to 5D and 5P, respectively, andtransmits the receive notification to the reassignment part 8. Thereceive notification includes the ID of the second read request on whichthe received read response is based, and the LBA specified by the secondread request. More specifically, assume that the SCSI interface 4Areceives the read response including the ID “b” and the LBA “a”. In thiscase, the SCSI interface 4A transmits the receive notification to thereassignment part 8. Note that the processing of updating the first list82 is irrespective of whether the read response is an ACK or NAK.

[0583] In response to the receive notification, the reassignment part 8executes the procedure shown by a flow chart of FIG. 33. Thereassignment part 8 first extracts the ID “b” and the LBA “a” from thereceived receive notification. The reassignment part 8 also search thefirst lists 82 being managed therein for the one in which the ID “b” isregistered (hereinafter referred to as first list 82 to be deleted)(step S111). When the reassignment part 8 does not manage the first list82 with the ID “b” registered therein even though the second readrequest has been transmitted, that means such list has been deleted instep S108 or S1013 of FIG. 32. In this case, that is, when thereassignment part 8 cannot find the first list 82 to be deleted instepS111, execution of steps S112 to S115 of FIG. 33 is not required, andthe procedure directly advances from step S111 to S116.

[0584] On the other hand, when the reassignment part 8 finds the firstlist 82 to be deleted in step S111, T_(D)>T_(L) has not been satisfiedin step S101 of FIG. 32 by the time immediately before receiving thereceive notification (that is, immediately before the present readresponse is returned thereto). Thus, the reassignment part 8 determineswhether T_(D)>T_(L) is satisfied or not at this time based on theinformation registered in the first list 82 to be deleted (step S112).When the delay time T_(D) exceeds the limit time T_(L), the reassignmentpart 8 has to determine whether the alternate area has to be assigned tothe defective area, and the procedure therefore advances to steps S103and thereafter shown in FIG. 32, which are shown by “B” in the flowchart of FIG. 33.

[0585] On the other hand, when the delay time T_(D) does not exceed thelimit time T_(L), that means the reading of the disk drive 5A does nottake a long time; the LBA specified by “a” is not defective. Therefore,the reassignment part 8 determines whether the reassignment part 8manages the second list 83 in which the same LBA as that in the firstlist 82 to be deleted is registered (step S113). When managing suchsecond list 83, the reassignment part 8 deletes the second list 83 (stepS114), and the procedure advances to step S115. Otherwise, the proceduredirectly advances from step S113 to step S115, wherein the reassignmentpart 8 deletes the first list 82 to be deleted.

[0586] The reassignment part 8 determines whether another second readrequest is kept in the disk drive 5 (hereinafter referred to as presenttransmitting drive) from which the present read response wastransmitted, based on the number of first lists 82 being managed for thepresent transmitting drive (step S116). When another second read requestis kept, the process start time has not yet been registered in the firstlist 82 created in response to the other second read request (the nextfirst list 82). The reassignment part therefore obtains the present timeT_(P) from the third timer 81, defining that processing of the othersecond read request is started at T_(P) in the present transmittingdrive. The reassignment part 8 registers the obtained present time T_(P)as the process start time for the other second read request in the nextfirst table 82 (step S117), and ends the procedure of FIG. 33.

[0587] On the other hand, when another second read request is not kept,the reassignment part 8 does not execute step S117, and ends theprocedure of FIG. 33.

[0588] In step S85 of FIG. 28, the controller 7 transmits the readtermination request for terminating reading of the redundant data to thereassignment part 8. The controller 7 also transmits, in step S812 ofFIG. 28, the read termination request for terminating reading of theunnecessary data block or redundant data. As described above, each readtermination request includes the LBA for specifying the storage locationof the data block or redundant data reading of which is to beterminated. Described next is the procedure when the reassignment part 8receives a read termination request with reference to FIG. 34.

[0589] The reassignment part 8 extracts the LBA from the received readtermination request, determining whether reading of the data block orredundant data from the LBA has been started (step S121). Morespecifically, the reassignment part 8 first searches the first lists 82being managed therein for the one with the LBA reading of which shouldbe terminated registered therein. The reassignment part 8 thendetermines whether the process start time has been registered in thefound first list 82 or not. As evident from above, the process starttime is not necessarily registered on creation of the first list 82.Therefore, at start of the procedure of FIG. 34, the reassignment part 8includes the first lists 82 with and without the process start timeregistered therein. Here, if the process start time has been registeredin the first list 82, that means reading of the data block or redundantdata from the corresponding LBA has been started. Therefore, based onwhether the process start time has been registered in the found firstlist 82, the reassignment part 8 determines whether processing of thesecond read request corresponding to the first list 82.

[0590] When determining in step S121 that reading from the LBA extractedfrom the read termination request has been started, the reassignmentpart 8 ends the procedure of FIG. 34.

[0591] On the other hand, when determining that the reading from the LBAhas not yet been started, the reassignment part 8 transmits an ABORT_TAGmessage, one of the SCSI messages, to the disk drive 5 including theextracted LBA through the SCSI interface 4, terminating the execution ofprocessing of the second read request corresponding to the found firstlist 82 (step S122). The SCSI interface 4 also transmits a NAKindicating that the reading for the corresponding second read requesthas been failed, to the controller 7.

[0592] After step S112, the reassignment part 8 deletes the first list82 found in step S121 (step S123).

[0593] As described above, the reassignment part 8 terminates theprocessing of the second read request in response to the readtermination request from the controller 7 only when the conditions ofstep S111 are satisfied, allowing correct detection of the defectivearea in the disk drives 5A to 5D and 5P. If the reassignment part 8unconditionally terminates the processing in response to the readtermination request, T_(D)>T_(L) is not satisfied for most of the secondread requests. As a result, the reassignment part 8 may not be able tocorrectly detect the defective area.

Seventh Embodiment

[0594] In the disk array device according to the fifth embodiment, thestorage location of the data block requiring much time to be read isstored in the faulty block table 75. By referring to such faulty blocktable 75, the controller 7 determines whether to transmit five or foursecond read requests, thereby realizing the disk array device capable ofreading a large volume of data per unit of time. However, the morefaulty data blocks requiring much time to be read are written into thefaulty block table 75, the more often the disk array device transmitsfive second read requests. As a result, the volume of data to be readper unit of time become smaller. Therefore, a seventh embodiment is tosolve the above problem, realizing a disk array device capable ofreading a larger volume of data per unit of time.

[0595]FIG. 35 is a block diagram showing the structure of the disk arraydevice according to the seventh embodiment of the present invention. Thedisk array device of FIG. 35 is different from that of FIG. 24 in thatthe controller 7 includes the same faulty block table 75 as that shownin FIG. 19. Since other structures is the same, the components in FIG.35 are provided with the same reference numerals as those in FIG. 24 andtheir description is omitted herein.

[0596] Furthermore, note that, in the present embodiment, the redundantdata is distributed across the disk drive 5A to 5D and 5P as shown inFIG. 20.

[0597] Like the sixth embodiment, in response to the first read request,the present disk array device also starts read operation that isdistinctive of the present embodiment, which is now described in detailwith reference to a flow chart in FIG. 36. FIG. 36 is the flow chartshowing the procedure from the time when the first read request arrivesat the controller 7 to the time when a set of second reading requestsare transmitted. Since the flow chart in FIG. 36 partially includes thesame steps as those in FIG. 26, the steps in FIG. 36 are provided withthe same step numbers as those in FIG. 26 and their description issimplified herein.

[0598] When provided with the first read request (step S1), thecontroller 7 fetches the LBA's specifying the storage locations of theparity group to be read from the address conversion part 11 (step S71).In other words, the controller 7 fetches the LBA's indicative of thestorage locations of the data blocks and redundant data of the sameparity group.

[0599] The controller 7 next determines whether any four of the diskdrives 5A to 5D and 5P have previously failed to read the four datablocks to be read this time (step S131). For determination in step S131,the controller 7 refers to the faulty block table 75, in which storagelocations of the data block reading of which has been previously failedare listed, as shown in FIG. 22 (Note that the storage locations areindicated by the LBA's in the present embodiment). Therefore, thecontroller 7 can easily make determination in step S131 by comparing theLBA of each data block fetched from the address conversion part 11 withthe LBA's listed in the faulty block table 75.

[0600] When determining in step S131 that reading of the four datablocks has not been previously failed, the controller 7 determines thatthere is a low possibility of failing to read the four data blocks thistime, and issues a set of second read requests to read the parity group(step S132). In step S132, however, the second read requests are issuedonly to the four disk drives storing the data blocks, and not to theremaining disk drive storing the redundant data.

[0601] When determining in step S131 that reading of the four datablocks has been previously failed, the controller 7 determines thatthere is a high possibility of failing to read the four data blocks thistime, and issues a set of second read requests to read the parity group(step S133). In step S133, however, the second read requests are issuedto the four disk drives storing the data blocks as well as the remainingdisk drive storing the redundant data.

[0602] The second read requests issued in step S132 are processed by thefour disk drives storing the data blocks of the same parity group, whilethose issued in step S133 are processed by the five disk drives storingthe data blocks and redundant data of the same parity group. In eithercase, each of the four or five disk drives generates a read responseindicating reading has been succeeded or failed. The four or five diskdrives transmit the generated read responses through the SCSI interfacesconnected thereto to the controller 7. The controller 7 executes theprocedure shown in FIG. 37 whenever the read response arrives. The flowchart of FIG. 37 includes the same steps as those in the flow chart ofFIG. 28, and further includes step S141. Therefore, the steps in FIG. 37are provided with the same step numbers as those in FIG. 28 and theirdescription is omitted herein.

[0603] When determining that a NAK has arrived (step S82), thecontroller 7 extracts the LBA from the NAK. The LBA included in the NAKindicates the storage location of the data block or redundant data whichhas been failed to be read. The controller 7 registers the LBA extractedfrom the NAK in the faulty block table 75 (step S141). Note that stepS141 may be executed at any timing as long as after it is determined instep S82 that the present read response is a NAK. That is, the executiontiming of step S141 is not restricted to the timing immediately afterdetermined in step S82 that the present read response is a NAK.

[0604] The reassignment part 8 executes the procedure described above inthe sixth embodiment. Description of this procedure is therefore omittedherein. The important point here is that, when the reassignment ends,the reassignment part 8 transmits a REASSIGN-COMPLETED notificationindicating the reassignment has ended, to the controller 7. ThisREASSIGN-COMPLETED notification includes the LBA indicative of thestorage location that is determined to be defective by the reassignmentpart 8. Since it takes much time to read from the defective area, theLBA indicative of such defective storage area is also written in thefaulty block table 75.

[0605] When receiving the REASSIGN-COMPLETED notification, thecontroller 7 executes the procedure shown in FIG. 38. First, onreceiving REASSIGN-COMPLETED notification, the controller 7 determinesthat the reassignment part 8 has executed reassignment (step S151), andthe procedure advances to step S152. In step S152, the controller 7extracts the LBA from the REASSIGN-COMPLETED notification. Thecontroller 7 then accesses to the faulty block table 75, and deletes theLBA matching the one extracted from the REASSIGN-COMPLETED notificationfrom the faulty block table 75, thereby updating the faulty block table75 (step S152).

[0606] As described above, also in the disk array device according tothe seventh embodiment, the storage location requiring much time to beread is assumed to be defective, and an alternate storage location isassigned thereto. That is, the storage location of the data block orredundant data is changed from the defective area to the alternate area.In response to such reassignment, the controller 7 updates the faultyblock table 75, preventing the data block or redundant data from beingkept stored in the defective area for a long time. Furthermore, in thepresent embodiment, the number of LBA's written in the faulty blocktable 75 for every reassignment decreases. Consequently, aspossibilities that the storage location (LBA) of the data block from theaddress conversion part 11 is written in the faulty block table 75decreases, the controller 7 can transmit four second read requests moreoften. As a result, it is possible to realize the disk array devicecapable of reading a larger volume of data per unit of time.

[0607] In the above described first to seventh embodiments, the diskarray device includes five disk drive. The number of disk drives,however, may be changed according to design requirements of the diskarray device such as the data length and the data block length, andtherefore is not restricted to five. Note that “m” in Claims correspondsto the number of disk drives included in the disk array device.

[0608] Furthermore, in the above described first to seventh embodiments,the host device transmits data of 2048 bytes to the disk array device ofeach embodiment, and the disk array device divides the received datainto data blocks of 512 bytes each. The sizes of the data and the datablock are, however, just one example for simplifying description, andare not restricted to 2048 bytes and 512 bytes, respectively.

Eighth Embodiment

[0609] As described in Background Art section, the disk array deviceexecutes reconstruction processing, in some cases. In an eighthembodiment of the present invention, reconstruction is to recover thedata block or redundant data in a faulty disk drive and rewrite therecovered data block or redundant data in a disk drive (another diskdrive or a recording area without a defect in the faulty disk drive).Furthermore, the disk array device has to transmit video data so thatthe video being replayed at the host device is not interrupted. Toprevent this interruption of video, when a read request for video dataarrives , the disk array device has to process the read request in realtime to transmit the video data. The eighth embodiment realizes a diskarray device capable of transmitting video data without interruption andexecuting reconstruction.

[0610]FIG. 39 is a block diagram showing the structure of the disk arraydevice according to the eighth embodiment of the present invention. InFIG. 39, the disk array device is constructed of a combination of RAID-4and RAID-5 architectures, including an array controller 21 and a diskarray 22. The array controller 21 includes a host interface 31, arequest rank identifying part 32, a controller 33, a queue managing part34, a request selector 35, a disk interface 36, a buffer managing part37, a parity calculator 38, and a table storage part 39. The disk array22 is constructed of five disk drives 41A to 41D and 41P.

[0611] Illustration of the structure is partly simplified in FIG. 39 asspace does not allow detailed illustration. With reference to FIG. 40,described next in detail is the structure of the queue managing part 34,the request selector 35, and the disk interface 36. In FIG. 40, thequeue managing part 34 is constructed of queue managing units 34A to 34Dand 34P, which are assigned to the disk drives 41A to 41D and 41P,respectively. The queue managing unit 34A manages a non-priority queue341A and a priority queue 342A. The queue managing unit 34B manages anon-priority queue 341B and a priority queue 342B. The queue managingunit 34C manages a non-priority queue 341C and a priority queue 342C.The queue managing unit 34D manages a non-priority queue 341D and apriority queue 342D. The queue managing unit 34P manages a non-priorityqueue 341P and a priority queue 342P. The request selector 35 isconstructed of request selection units 35A to 35D and 35P, which areassigned to the disk drives 41A to 41D and 41P, respectively. The diskinterface 36 is constructed of SCSI interfaces 36A to 36D and 36P, whichare assigned to the disk drives 41A to 41D and 41P, respectively.

[0612] Described next is the detailed structure of the buffer managingpart 37 with reference to FIG. 41. In FIG. 41, the buffer managing part37 manages buffer memories 37A to 37D, 37P, and 37R. The buffer memory37A is divided into a plurality of buffer areas 37A₁, 37A₂ . . . Eachbuffer area has a capacity of storing a data block or redundant data,which will be described below. Further, an identifier (normally, topaddress of each buffer area) is assigned to each buffer area to uniquelyidentify each buffer area. The identifier of each buffer area ishereinafter referred to as a pointer. Each of the other buffer memories37B to 37D, 37P, and 37R is also divided into a plurality of bufferareas. A pointer is also assigned to each buffer area, like the bufferarea 37A₁.

[0613] Referring back to FIG. 40, the disk group of the disk drives 41Ato 41D and 41P is now described. Since the architecture of the presentdisk array device is based on the combination of RAID-3 and RAID-4, thedata blocks and redundant data of the same parity group are distributedacross the disk drives 41A to 41D and 41P, which form one disk group.Here, the parity group is, as described in Background Art section, a setof data blocks and redundant data generated based on one piece of datatransmitted from the host device. The disk group is a set of pluralityof disk drives into which the data blocks and redundant data of the sameparity group are written. In the present embodiment, the disk group ofthe disk drives 41A to 41D and 41P is hereinafter referred to as a diskgroup “A”. Further, a plurality of LUN's (Logical Unit Number) areassigned to each disk group. The plurality of LUN's are different foreach disk group, and the LUN's in one disk group are also different eachother. Such LUN's are used for specifying a disk group to be accessedand the level of priority of an access request. In the presentembodiment, “non-priority” and “priority” are previously defined as thelevel of priority of an access request. Two LUN's “0” and “1” areassigned to the disk group A. The LUN “0” represents that the accessrequest is given “non-priority”, while the LUN “1” represents the accessrequest is given “priority”.

[0614] Described briefly next is the host device placed outside the diskarray device. The host device is connected to the host interface 31 soas to be able to bi-directionally communicate therewith. The I/Ointerface between the host device and the host interface is based onSCSI (Small Computer System Interface). To write or read data, the hostdevice requests access to the disk array device. The procedure of accessis now described below. First, the host device gains control of the SCSIbus through the ARBITRATION phase. The host device then specifies atarget disk array device through the SELECTION phase. The host devicethen transmits an Identify message (refer to FIG. 42a), one of the SCSImessages, to specify the LUN, thereby specifying the disk group to beaccessed and the level of priority of the access request. Further, thehost device transmits a Simple_Queue_Tag (refer to FIG. 43b), one of theSCSI messages, to transmit a plurality of access requests to the diskarray device. To read data, the host device sends a Read_(—)10 commandof a SCSI command (refer to FIG. 43a) to the disk array device. TheRead_(—)10 command specifies the LBA specifying the storage location ofthe data to be read and the length of the data. To write data, the hostdevice sends a Write_(—)10 command (refer to FIG. 43b) to the disk arraydevice. The Write_(—)10 command specifies the LBA specifying the storagelocation of the data to be written and the length of the data. The hostdevice further transmits the data to be written to the disk arraydevice. In this manner, the host device requests access to the diskarray device.

[0615] The data to be written into the disk array device is nowdescribed. The transmission data from the host device includes twotypes: real-time data and non-real-time data. The real-time data is thedata to be processed in the disk array device in real time such as videodata. The non-real-time data is the data to be processed in the diskarray device not necessarily in real time such as computer data. Thereal-time data and non-real-time data are large in general. A pluralityof host devices are connected to the disk array device, sharing one SCSIbus. Assuming that such large real-time data or non-real-time data iswritten into the disk array device all at once, the SCSI bus is usedexclusively by a specific host device, and cannot be used by the otherhost devices. To prevent such detriment, the host device divides thelarge real-time data or non-real-time data into a predetermined size,and transmits the data to the disk array device by that size. In otherwords, the host device sends only part of the data by the predeterminedsize in one request, and executes this sending operation several timesto write the whole data, thereby preventing the SCSI bus from being usedexclusively by a specific host device.

[0616] Described next is how the disk array device operates when thehost device requests the disk group “A” to write non-real-time data withreference to a flow chart of FIG. 44. Since the non-real-time data isprocessed in the disk array device not necessarily in real time, the LUNcomposed of a set of “0” and “A” is set in the Identify message to besent during the access request. Further, the host device sends thenon-real-time data to be written and a Write_(—)10 command to the diskarray device.

[0617] When receiving the SCSI message, SCSI command and data(non-real-time data) to be written from the host device (step S161), thehost interface 31 determines that the host device requests access, andthe procedure advances to step S162. The host interface 31 thengenerates a first process request based on the access request from thehost device.

[0618]FIG. 45 shows a format of the first process request to begenerated by the host interface 31. In FIG. 45, the first processrequest includes information on a command type, an identificationnumber, LUN, control information, LBA, and data length. As the commandtype, the operation code of the Write_(—)10 command is set. Forconvenience in description, assume herein that “W” is set in the commandtype for the Write_(—)10 command. With this command type, the hostinterface 31 specifies that the generated first process request is forwriting. As the identification number, the number indicative of a queuetag included in the received Simple_Queue_Tag command is set. As theLUN, the number indicative of the LUN included in the received Identifycommand from the host interface 31 is set. When the host device requeststhe disk group “A” to write non-real-time data, a set of “0” indicativeof priority of the present access request and “A” indicative of the diskgroup to be accessed is set as the present LUN's. As the controlinformation, cache control information such as DPO and FUA included inthe Read_(—)10 or Write_(—)10 received by the host interface 31 is set.As the LBA, the value specifying the LBA included in the Read_(—)10 orWrite_(—)10 is set. As the data length, the length of the data to beread by the Read_(—)10 or to be written by the Write_(—)10 is set.Furthermore, only when the host interface 31 receives Write_10, the datais set in the first process request. The data in the first processrequest is the data itself (non-real-time data or real-time data)transmitted with the Write_(—)10 from the host device. The first processrequest generated in the above manner is transmitted to the request rankidentifying part 32 (step S162).

[0619] When receiving the first process request, the request rankidentifying part 32 extracts the information on the LUN from the request(step S163). The request rank identifying part 32 further identifies thelevel of priority of the received first process request, determining towhich disk group is requested to be accessed (step S164). Since the setof “0” and “A” is extracted as the LUN's from the present first processrequest, the request rank identifying part 32 identifies the level ofpriority as “non-priority” and the disk group as “A”. After theidentification ends, the request rank identifying part 32 transmits thereceived first process request, the identification results(“non-priority” and the disk group “A”) to the controller 33 (stepS165).

[0620] When receiving the first process request and identificationresults from the request rank identifying part 32, the controller 33determines whether the first process request has priority or not (stepS166). When the information on priority is “non-priority”, thecontroller 33 determines whether the operation called“Read_Modify_Write” is required or not (step S167). More specifically,in step S167, the controller 33 determines whether to read the datablocks required for updating the redundant data stored in the disk drive41P (these data block are hereinafter referred to as data blocks forupdate) or not. When the controller 33 determines not to read the datablocks for update, the procedure directly advances to step S1612, whichwill be described later. That is, write operation according to theRAID-3 architecture is executed.

[0621] On the other hand, when determining to read the data blocks forupdate, the controller 33 generates first read requests to read the datablocks for update. The first read request has a format shown in FIG. 46,which is different from that shown in FIG. 45 in that the information ofthe LUN is replaced with the level of priority and the disk group. Sincethe level of priority is “non-priority” and the disk group is “A” in thepresent first process request, the controller 33 enqueues the generatedfirst read requests to the non-priority queue 341A to 341D assigned tothe disk drives 41A to 41D, respectively (step S168).

[0622] Each of the request selection units 35A to 35D and 35P executesthe processing of step S169. Specifically, when the disk drive 41A endsprocessing (read or write), the request selection unit 35A firstdetermines whether any request generated by the controller 33 such asthe second read request has been enqueued to the priority queue 342Aassigned to the disk drive 41A. When determining that a request has beenenqueued, the request selection unit 35A selects and dequeues one of therequests from the priority queue 342A, and transmits the dequeuedrequest to the SCSI interface 36A assigned to the disk drive 41A. TheSCSI interface 36A instructs the disk drive 41A to execute the receivedrequest.

[0623] When determining that any request has not been enqueued to thepriority queue 342A, that is, the priority queue 342A is empty, therequest selection unit 35A determines whether any request generated bythe controller 33 such as the first read request has been enqueued tothe non-priority queue 341A assigned to the disk drive 41A. Whendetermining that a request has been enqueued, the request selection unit35A selects and dequeues one of the requests from the non-priority queue341A. The SCSI interface 36A instructs the disk drive 41A to execute therequest dequeued from the non-priority queue 341A.

[0624] When determining that any request has not been enqueued to thepriority queue 341A, that is, the priority queue 342A and thenon-priority queue 341A are both empty, the request selection unit 35Awaits for the disk drive 41A ending the present processing (step S169).

[0625] As described above, the request selection unit 35A transmits therequest in the priority queue 342A to the SCSI interface 36A with higherpriority than the request in the non-priority queue 341A. Since theother request selection units 35B to 35D and 35P perform the sameprocessing as described for the request selection unit 35A, itsdescription is omitted herein.

[0626] When the request is sent from the SCSI interfaces 36A to 36D and36P, the disk drives 41A to 41D and 41P respectively process thereceived request (step S1610). Therefore, the first read requestsenqueued to the non-priority queues 341A to 341D are processed by thedisk drives 41A to 41D with lower priority than the requests enqueued tothe priority queues 342A to 342D. Therefore, the data blocks for updateof non-real time data are read by the disk drives 41A to 41D withoutaffecting reading and writing of the real-time data. When reading of thedata blocks for update has been successfully completed, the disk drives41A to 41D transmit the read data blocks for update and aREAD-COMPLETED, a signal indicating that reading has been successfullycompleted, to the SCSI interfaces 36A to 36D, respectively.

[0627] When receiving the data blocks for update and the READ-COMPLETED,the SCSI interfaces 36A to 36D store the data blocks for update inpredetermined buffer areas 37A_(i) to 37D_(i), (i=1, 2, . . . ). Thebuffer areas 37A_(i) to 37D_(i) are specified by the controller 33. Thatis, pointers indicative of the buffer areas 37A_(i) to 37D_(i) are setin the first read requests which have triggered reading of the datablocks for update. According to the pointers in the first read requests,the SCSI interfaces 36A to 36D specify the buffer areas 37A₁ to 37D_(i)in which the data blocks for update are to be stored. The SCSIinterfaces 36A to 36D transmit the received READ-COMPLETED's to thecontroller 33.

[0628] Based on the READ-COMPLETED's, the controller 33 determineswhether the disk drives 41A to 41D have ended reading of the data blocksfor update. When the data blocks for update have been stored in thebuffer areas 37A_(i) to 37D_(i) (step S1611), the controller 33 extractsthe non-real time data included in the present process request. When“Read_Modify_Write” is executed, since the extracted non-real-time databelongs to the same parity group as that of the data blocks for updatestored in the buffer areas 37A_(i) to 37D_(i), the data blocks composingthe parity group to be updated are updated. The controller 33 stores theextracted non-real-time data in the buffer areas in which the datablocks to be updated are stored. For example, to update the entire datablock in the buffer area 37A_(i), the controller 33 writes the extractednon-real-time data on the data block in the buffer area 37A_(i).

[0629] The controller 33 then instructs the parity calculator 38 tooperate calculation of parity. In response to the instruction, theparity calculator 38 operates calculation of parity to create newredundant data according to the present updating of the non-real-timedata. The created redundant data is stored in the buffer area 37R_(i)(i=1, 2, . . . ). Thus, the entire data blocks and redundant data (theparity group) to be updated are stored in the buffer areas.

[0630] The procedure then advances to step S1612. The controller 33first generates a first write request to write the updated redundantdata in the disk drive 41P. The controller 33 then reconfirms that thelevel of priority of the present first process request is “non-priority.After reconfirmation, the controller 33 enqueues the generated firstwrite request to the non-priority queue 341P assigned to the disk drive41P (step S1612).

[0631] The controller 33 next replaces the information on the LUN in thepresent first process request with the received information on priorityand the disk group, thereby converting the first process request intosecond write requests to the disk drives 41A to 41D. The controller 33generates second write requests as many as the number of disk drives 41Ato 41D. Here, the second write request has the same format as that ofthe first read request (refer to FIG. 46). The controller 33 thenenqueues the generated second write requests to the non-priority queues341A to 341D assigned to the disk drives 41A to 41D, respectively,according to the information of “non-priority” and the disk group “A”(step S1613).

[0632] Each of the request selection units 35A to 35D and 35P executesprocessing as described above in step S169. Thus, the first writerequest enqueued to the non-priority queue 341P is processed by the diskdrive 41P with lower priority. The new redundant data stored in thebuffer area 37P_(i) is therefore written into the disk drive 41P. Thesecond write requests in the non-priority queues 341A to 341D are alsoprocessed by the disk drives 41A to 41D, respectively, with lowerpriority. Thus, the data blocks in the buffer areas 37A_(i) to 37D_(i)are written in the disk drives 41A to 41D. Thus, according to the accessrequest by the host device, the non-real-time data is made redundant,and distributed across the disk drives 41A to 41D and 41P in the diskarray 22.

[0633] After completing its writing, each disk drive generates aWRITE-COMPLETED, a signal indicating that writing has been completed.The generated WRITE-COMPLETED's are transmitted through the SCSIinterfaces 36A to 36D and 36P to the controller 33. When receiving allWRITE-COMPLETED's generated by the disk drives 41A to 41D and 41P (stepS1614), the controller 33 determines that the non-real-time datarequested from the host device has been completely written in the diskdrives. Further, the controller 33 notifies the host device through thehost interface 31 that writing of the non-real-time data has been ended(step S1615).

[0634] Described next is how the present disk array device operates whenthe host device requests the parity group “A” to write real-time datawith reference to a flow chart shown in FIG. 44. Since real-time datahas to be processed in the disk array device in real time, the LUNcomposed of a set of “1” and “A” is set in the Identify message (referto FIG. 42a) to be sent during the process of access request. Further,the host device transmits the real-time data to be written and aWrite_(—)10 command to the disk array device.

[0635] When receiving the access request (a series of the SCSI message,the SCSI command, and the real-time data) transmitted from the hostdevice (step S161), the host interface 31 generates a second processrequest, and transmits the request to the request rank identifying part32 (step S162). Here, the second process request has the same format asthat of the first process request (refer to FIG. 45).

[0636] When receiving the second process request, the request rankidentifying part 32 identifies the level of priority of the receivedsecond process request, determining to which disk group is requested tobe accessed (steps S163 and S164). Since the set of “1” and “A” isextracted as the LUN from the present second process request, therequest rank identifying part 32 identifies the level of priority as“priority” and the disk group as “A”. After the identification ends, therequest rank identifying part 32 transmits the received second processrequest, the identification results (“priority” and the disk group “A”)to the controller 33 (step S165).

[0637] When the level of priority received is “priority”, the procedurefrom steps S1616 to S1622 is similar to that from steps S167 to S1613,and therefore mainly described below is the difference between stepsS167 to S1613 and steps S1616 to S1622.

[0638] By referring to the information on priority included in thereceived identification results, the controller 33 determines whetherthe first process request has priority or not (step S166). Even when theinformation on priority is “priority”, the controller 33 also determineswhether the operation called “Read_Modify_Write” is required or not(step S1616). More specifically, in step S1616, the controller 33determines whether to read the data blocks for update or not. When thecontroller 33 determines not to read the data blocks for update, theprocedure directly advances to step S1621. That is, write operationaccording to the RAID-3 architecture is executed.

[0639] On the other hand, when determining to read the data blocks forupdate, the controller 33 generates second read requests to read thedata blocks for update. The second read request has the same format asthat of the first read request (refer to FIG. 46), but the informationon priority “non-priority” is replaced with “priority”. Since the levelof priority is “priority” and the disk group is “A” in the presentsecond process request, the controller 33 enqueues the generated secondread requests to the priority queues 342A to 342D assigned to the diskdrives 41A to 41D, respectively (step S1617).

[0640] Each of the request selection units 35A to 35D and 35P executesstep S1618, which is the same as step S169. Each of the disk drives 41Ato 41D then executes step S1619, which is the same as step S1610. As aresult, the second read requests in the priority queues 342A to 342D areprocessed by the disk drives 41A to 41D with higher priority than thosein the non-priority queues 341A to 341D. When processing of the secondread requests is normally ended, each of disk drives 41A to 41Dtransmits the read data block for update and a READ-COMPLETED to eachcorresponding buffer areas 37A_(i) to 37A_(i) and the controller 33through the SCSI interfaces 36A to 36D, respectively.

[0641] If the data blocks for update have been stored in the bufferareas 37A_(i) to 37A_(i) (step S1620), the controller 33 extracts thereal-time data included in the second process request, and stores theextracted real-time data in the buffer area in which the data block tobe updated is stored.

[0642] The controller 33 then instructs the parity calculator 38 tooperate calculation of parity. In response to this instruction, theparity calculator 38 operates calculation of parity, creating newredundant data according to the update of the real-time data, andstoring the same in the buffer area 37R_(i) (i=1, 2, . . . ).

[0643] The procedure then advances to step S1622, wherein the controller33 generates a third write request for writing the updated redundantdata in the disk drive 41P. The controller 33 reconfirms that the levelof priority of the present second process request is “priority”. Afterreconfirmation, the controller 33 enqueues the generated third writerequest to the priority queue 342P (step S1621).

[0644] The controller 33 next replaces the information on the LUN in thepresent second process request with the received information on priorityand the disk group, thereby converting the second process request intofourth write requests to the disk drives 41A to 41D. The controller 33generates fourth write requests as many as the number of disk drives 41Ato 41D. Here, the fourth write request has the same format as that ofthe first read request (refer to FIG. 46). The controller 33 thenenqueues the generated fourth write requests to the priority queues 342Ato 342D according to the information of “priority” and the disk group“A” (step S1622).

[0645] Each of the request selection units 35A to 35D and 35P executesprocessing of step S1618. Thus, the third write request enqueued to thepriority queue 342P is processed by the disk drive 41P with lowerpriority. The new redundant data stored in the buffer area 37P_(i) istherefore written into the disk drive 41P. The fourth write requests inthe priority queues 342A to 342D are also processed by the disk drives41A to 41D, respectively, with priority. Thus, the data blocks in thebuffer areas 37A_(i) to 37D_(i) are written in the disk drives 41A to41D. Thus, according to the access request by the host device, thereal-time data is made redundant, and distributed across the disk drives41A to 41D and 41P in the disk array 22.

[0646] After completing its writing, each disk drive transmits aWRITE-COMPLETED through the SCSI interfaces 36A to 36D and 36P to thecontroller 33. When receiving all WRITE-COMPLETED's generated by thedisk drives 41A to 41D and 41P (step S1614), the controller 33determines that the real-time data requested from the host device hasbeen completely written in the disk drives. Further, the controller 33notifies the host device through the host interface 31 that writing ofthe real-time data has been ended (step S1615).

[0647] Described next is how the disk array device operates when thehost device requests the disk group “A” to read non-real-time data withreference to a flow chart of FIG. 47. Since the non-real-time data isprocessed in the disk array device not necessarily in real time, the LUNcomposed of a set of “0” and “A” is set in the Identify message to besent during the access request. Further, the host device transmits aRead_(—)10 command to the disk array device.

[0648] As shown in the flow chart of FIG. 47, when receiving the SCSImessage, SCSI command and data (non-real-time data) to be read from thehost device (step S171), the host interface 31 determines that the hostdevice requests access, and the procedure advances to step S172. Thehost interface 31 then generates a third process request having the sameformat as that of the first process request based on the access requestfrom the host device (step S172).

[0649] When receiving the third process request, the request rankidentifying part 32 extracts the information on the LUN from the request(step S173). The request rank identifying part 32 further identifies thelevel of priority of the received third process request, and determinesto which disk group is requested to be accessed (step S174). Sincethe'set of “0” and “A” is extracted as the LUN from the present thirdprocess request, the request rank identifying part 32 identifies thelevel of priority as “non-priority” and the disk group as “A”. After theidentification ends, the request rank identifying part 32 transmits thereceived third process request and the identification results(“non-priority” and the disk group “A”) to the controller 33 (stepS175).

[0650] When receiving the third process request and identificationresults from the request rank identifying part 32, the controller 33determines whether the third process request has priority or not (stepS176).

[0651] When the information on priority is “non-priority” the controller33 replaces the information on the LUN in the present third processrequest with the received information on priority and the disk group,thereby converting the third process request into third read requests tothe disk drives 41A to 41D. The controller 33 generates third readrequests as many as the number of disk drives 41A to 41D. Here, thethird read request has the same format as that of the first read request(refer to FIG. 46). The controller 33 then enqueues the generated thirdread requests to the non-priority queues 341A to 341D assigned to thedisk drives 41A to 41D, respectively, according to the information“non-priority” and the disk group “A” (step S177).

[0652] When the disk drives 41 to 41D end processing (read or write),each of the request selection units 35A to 35D executes the processingof step S178, which is the same as step S169. Thus, the third readrequests in the non-priority queues 341A to 341D are processed by thedisk drives 41A to 41D with lower priority (step S179). Therefore, thedata blocks composing the non-real-time data are read by the disk drives41A to 41D without affecting reading and writing of the real-time data.If reading the data blocks has been normally completed, the disk drives41A to 41D transmit the read data blocks and a READ-COMPLETED to theSCSI interfaces 36A to 36D, respectively. When receiving the data blocksand the READ-COMPLETED's, the SCSI interfaces 36A to 36D store the datablocks for update in predetermined buffer areas 37A_(i) to 37D_(i) (i=1,2, . . . ). The buffer areas 37A_(i) to 37D_(i) are specified by thecontroller 33. That is, pointers indicative of the buffer areas 37A_(i)to 37D_(i) are set in the third read requests which have triggeredreading of the data blocks. According to the pointers in the third readrequests, the SCSI interfaces 36A to 36D specify the buffer areas37A_(i) to 37D_(i) in which the data blocks are to be stored. The SCSIinterfaces 36A to 36D transmit the received READ-COMPLETED's to thecontroller 33.

[0653] On the other hand, if reading of the data blocks (non-real-timedata) has not been normally completed due to failure and the like, eachof disk drives 41A to 41D generates a READ-FAILED, a signal indicatingthat the reading has not been normally completed. The generatedREAD-FAILED's are transmitted to through the SCSI interfaces 36A to 36Dto the controller 33.

[0654] The controller 33 determines whether the disk drives 41A to 41Dhave successfully completed reading the data blocks (non-real-time data)or not (step S1710). When receiving READ-COMPLETED's from the diskdrives 41A to 41D, the controller 33 determines that the disk drives 41Ato 41D have successfully completed reading the data blocks, and furtherrealizes that the data blocks have been stored in the buffer areas37A_(i) to 37D_(i) (step S1711). The controller 33 then transmits thepointers of the buffer areas 37A_(i) to 37D_(i) and the information forspecifying the order of the data blocks to the host interface 31,instructing to transmit the non-real-time data to the host device. Whenreceiving such information, the host interface 31 accesses to the bufferareas 37A_(i) to 37D_(i) according to the order of the data blocks tofetch the data blocks from these buffer areas. Thus, the data blocks areassembled into the non-real-time data to be transmitted to the hostdevice. The host interface 31 transmits the assembled non-real-time datato the host device (step S1712).

[0655] On the other hand, in step S1710, when receiving a READ-FAILEDfrom any of the disk drives 41A to 41D, the controller 33 determinesthat all disk drives 41A to 41D have not successfully completed reading.The procedure then advances to step S1713, wherein the processing at thetime of abnormal reading is executed.

[0656]FIG. 48 is a flow chart showing the procedure of step S1713 indetail. The controller 33 generates a new fourth read request to recoverthe unread data block (step S181). The processing in step S181 isdefined by the RAID-3 architecture. The fourth read request is a signalfor reading the redundant data from the disk drive 41P.

[0657] The controller 33 then reconfirms whether the information onpriority is “priority” or “non-priority” (step S182). When“non-priority”, the controller 33 enqueues the generated fourth readrequest to the non-priority queue 341P (step S183).

[0658] If the disk drive 41P has completed processing (read or write),the request selection unit 35P executes the similar processing to thatof step S178 in FIG. 47 (step S184). With step S184, each fourth readrequest in the non-priority queue 341P is processed by the disk drive41P with lower priority (step S185). As a result, the redundant datacomposing the non-real-time data requested to be read is read from thedisk drive 41P without affecting the processing (read or write) of thereal-time data. If reading has been normally completed, the disk drive41P transmits the redundant data and a READ-COMPLETED to the SCSIinterface 36P. When receiving the redundant data and READ-COMPLETED, theSCSI interface 36P stores the redundant data in the predetermined bufferarea 37P_(i) (i=1, 2, . . . ). The buffer area 37P_(i) is specified bythe controller 33. That is, a pointer indicative of the buffer area37P_(i) is set in the fourth read request which has triggered reading ofthe redundant data. According to the pointer in the fourth read request,the SCSI interface 36P specifies the buffer area 37P_(i) in which theredundant data is to be stored. The SCSI interface 36P transmits thereceived READ-COMPLETED to the controller 33.

[0659] When receiving the READ-COMPLETED, the controller 33 instructsthe parity calculator 38 to operate calculation of parity. In responseto this instruction, the parity calculator 38 operates calculation ofparity to recover the faulty data block. The faulty data block is storedin the buffer area 37R_(i) (i=1, 2, . . . ) (step S186). The controllerthen exits from the procedure of FIG. 48 to return to step S1711 of FIG.47. When the processing shown in FIG. 48 at the time of abnormal readingends, all data blocks composing the requested non-real-time data havebeen stored in the buffer areas (step S1711). Then, the host interface31 transmits the non-real-time data to the host device, as describedabove.

[0660] Described next is how the present disk array device operates whenthe host device requests the disk group “A” to read real-time data withreference to the flow chart of FIG. 47. Since the real-time data has tobe processed in the disk array device in real time, the LUN composed ofa set of “1” and “A” is set in the Identify message to be sent duringthe access request. Further, the host device transmits a Read_(—)10command to the disk array device.

[0661] As shown in the flow chart of FIG. 47, when receiving the SCSImessage, SCSI command and data (real-time data) to be read from the hostdevice (step S171), the host interface 31 generates a fourth processrequest having the same format as that of the first process requestbased on the access request from the host device. The generated fourthprocess request is transmitted to the request rank identifying part 32(step S172).

[0662] The request rank identifying part 32 extracts the information onthe LUN from the received fourth process request (step S173). Therequest rank identifying part 32 identifies the level of priority of thereceived fourth process request, and determines to which disk group isrequested to be accessed (step S174). Since the set of “1” and “A” isextracted as the LUN from the present fourth process request, therequest rank identifying part 32 identifies the level of priority as“priority” and the disk group as “A”. After the identification ends, therequest rank identifying part 32 transmits the received fourth processrequest and the identification results (“priority” and the disk group“A”) to the controller 33 (step S175).

[0663] The controller 33 determines whether the fourth process requesthas priority or not by referring to the information on priority includedin the received identification results (step S176).

[0664] When the information on priority is “priority”, the controller 33replaces the information on the LUN in the present fourth processrequest with the received information on priority and the disk group,thereby converting the fourth process request into fifth read requeststo the disk drives 41A to 41D. The controller 33 generates fifth readrequests as many as the number of disk drives 41A to 41D. Here, thefifth read request has the same format as that of the first read request(refer to FIG. 46). The controller 33 then enqueues the generated fifthread requests to the priority queues 342A to 342D assigned to the diskdrives 41A to 41D, respectively, according to the information “priority”and the disk group “A” (step S177).

[0665] Each of the request selection units 35A to 35D executesprocessing as described above in step S178. Thus, the data blockscomposing the requested real-time data are read in real time by the diskdrives 41A to 41D.

[0666] Since the following steps S1710 to S1713 are the same as forreading of the non-real-time data, their description is omitted herein.However, the data to be processed in the disk array device is notnon-real-time data but real-time data. Therefore, when the processing ofstep S1713 at the time of abnormal reading is executed, the controller33 enqueues the generated fifth read request to the priority queue 342P(step S188).

[0667] As described above, the host device transmits the access requestincluding the information on priority and others to the disk arraydevice. Based on the received access request, the array controller 21generates a request (read or write) for each of the disk drives 41A to41D and 41P, and enqueues the request to a predetermined queue(non-priority queue or priority queue) according to its priority.Therefore, requests with higher priority are processed with priority inthe disk array 22. Thus, when a higher-priority access request to beprocessed in real time and a lower-priority access request to beprocessed not necessarily in real time are both transmitted to the diskarray device, processing of non-real-time data does not affectprocessing of real-time data.

[0668] Described next is data reconstruction processing in the presentdisk array device. In the following description, a faulty disk drive isa disk drive in which a data block recorded therein has a fault, andreconstruction is processing of recovering a data block or redundantdata in a faulty drive and rewriting the recovered data block orredundant data into a disk drive (another disk drive or normal recordingarea in the faulty drive). The present disk array device executes twotypes of reconstruction: a first reconstruction processing is to preventadverse effect on processing of real-time data executed in the diskarray device, while a second reconstruction processing is to ensure thetime limit of data reconstruction using predetermined part of thebandwidth of the disk first.

[0669] In these two types of reconstruction, a table storage part 39shown in FIG. 49 is used. The table storage part 39 is, as shown in FIG.49, stores managing tables 39A to 39D and 39P for the disk drives 41A to41D and 41P (the disk group “A”). LBA statuses assigned to each entirerecording area of the disk drives 39A to 39D and 39P are stored in themanaging tables 39A to 39D and 39P, respectively. For example, the LBAstatus is set in each corresponding section in the managing table 39A.

[0670] As shown in FIG. 50, the types of status include “normal”,“defective” (not shown in FIG. 50), “reconstruction-required”, and“under reconstruction”. The status “normal” indicates that the LBA isnot defective. The status “defective” indicates that the LBA isdefective. The “reconstruction-required” indicates that the LBA isrequired to be reconstructed. The status “under reconstruction”indicates that the LBA is being reconstructed.

[0671] When detecting that one of the disk drives 41A to 41D and 41Pbecomes failed, the SCSI interfaces 36A to 36D and 36P first notifiesthe controller 33 that the disk drive becomes defective. Here, thefaulty disk drive is detected when a notification of the faulty diskdrive is received or when a response from the disk drives 41A to 41D to41P does not return to the SCSI interfaces 36A to 36D and 36P within apredetermined time.

[0672] When detecting the faulty disk drive, the controller 33 accessesto the table storage part 39, updating the managing table for the faultydisk drive and setting the status of the faulty LBA to “defective”. Forexample, when all of the recording areas in the faulty disk drive becomedefective, all of the LBA statuses are set to “defective”.

[0673] Described next is the first reconstruction processing when all ofthe LBA's in the disk drive 41A are defective. FIG. 51 is a flow chartshowing the general procedure of the first reconstruction.

[0674] The controller 33 separates the faulty disk drive 41A from thedisk group “A”, and puts a spare disk drive (not shown) into the diskgroup. Further, the controller 33 creates a managing table (not shown inFIG. 49) for the spare disk drive in the table storage part 39. In thenewly created managing table, all LBA status are initially set to“reconstruction-required”. Furthermore, since the faulty disk drive 41Ais replaced with the spare disk drive, the controller 33 assigns thenon-priority queue 341A, the priority queue 342A, the request selectionunit 35A, and the SCSI interface 36A to the spare disk drive.

[0675] The controller 33 then checks the first LBA of the new managingtable (step S191). When the status of the first LBA is“reconstruction-required” (step S192), that LBA is to be processed. Thecontroller 33 then accesses to the queue managing part 34, determiningwhether or not the number of buffer areas currently used is less than apredetermined number “M”, and the number of requests for reconstructionenqueued to the non-priority queues 341A to 341D and 341P (describedlater) is less than a predetermined number “N” (step S193).

[0676] In step S193, a large number of requests for reconstruction canbe prevented from occurring at the same time. Two reasons why the numberof occurrence of requests has to be limited are described below. Thefirst reason is that the large number of occurrence increases thepossibility that the access request from the host device having the samelevel of priority as the request for reconstruction will be leftunprocessed. For example, if the number of requests for reconstructionis kept less than “N”, it can be ensured that the access request fromthe host device will be processed after the Nth request at the latest.The predetermined number “N” is determined based on how many accessrequests from the host device with the same priority as the request forreconstruction are to be processed during reconstruction processing.

[0677] The second reason is that the large number of occurrence ofrequests may cause shortage of memory (not shown) in the arraycontroller 21. More specifically, the request for reconstructionrequires memory (buffer area) for storing information on the request,and also memory for storing data in write operation. Therefore, when thearray controller 21 generates a large number of requests forreconstruction in a short time, shortage of the memory (buffer areas)therein may occur. Further, with shortage of the internal memory, thedisk array device cannot receive any access request from the hostdevice. For example, assuming that “M” buffer areas are used for storingthe access requests from the host device at maximum, the arraycontroller 21 stops generating the requests for reconstruction when thenumber of remaining buffer areas becomes “M”. As evident from above, thepredetermined number “M” is determined according to the number of bufferareas used when the disk array device receives the access requests fromthe host device at maximum.

[0678] The controller 33 waits until the conditions in step S193 aresatisfied, and then executes the first reconstruction for the LBA to beprocessed (step S194). Here, when the conditions in step S193 are stillsatisfied after new reconstruction processing is activated, thecontroller 33 selects a new LBA to be processed, activating the nextfirst reconstruction processing. Similarly, the controller 33 continuesactivating the first reconstruction processing until the conditions instep S193 become not satisfied. Described next is the detailed procedurein step S194 with reference to a flow chart of FIG. 52.

[0679] The controller 33 first changes the status of the LBA to beprocessed from “reconstruction-required” to “under reconstruction” (stepS201). The controller 33 generates sixth read requests for reading thedata required for recovering the data to be recorded in the LBA to beprocessed by calculation of parity (hereinafter referred to as data forrecovery). Here, in the first reconstruction processing, the data forrecovery is not restricted to a data block, but is the data storable inone LBA. The controller 33 generates the sixth read requests as many asthe number of disk drives 41B to 41D and 41P excluding the faulty diskdrive 41A and the spare disk drive. Each sixth read request has the sameformat as the first read request (refer to FIG. 46). The controller 33enqueues the created sixth read requests to the non-priority queues 341Bto 341D and 341P (step S202).

[0680] The request selection units 35A to 35D and 35P executes the sameprocessing as that in step S169 (step S203). Therefore, the presentsixth read requests are dequeued from the non-priority queues 341B to341D and 341P by the request selection units 35B to 35D and 35P, andtransmitted to the SCSI interfaces 36B to 36D and 36P. The disk drives41B to 41D and 41P process the received sixth read requests to read thedata for recovery (step S204). In this way, enqueued to the non-priorityqueues 341B to 341D and 341P, the present sixth read requests areprocessed by the disk drives 41B to 41D and 41P with lower priority.When completing reading, each of the disk drives 41B to 41D and 41Ptransmits a READ-COMPLETED, a signal indicating that reading has beencompleted, and the data for recovery to the SCSI interfaces 36B to 36Dand 36P. Each data for recovery is stored in each of the buffer areas37B_(i) to 37D_(i) and 37P_(i), like the data blocks composingnon-real-time data or the like. Further, each READ-COMPLETED istransmitted through the SCSI interfaces 36B to 36D and 36P to thecontroller 33.

[0681] The controller 33 determines whether the data for recovery fromthe disk drives 41B to 41D and 41P has been stored in the buffer areas37B_(i) to 37D_(i) and 37P_(i) according to the READ-COMPLETED's (stepS205). If the data for recovery has been stored, the controller 33instructs the parity calculator. 38 to operate calculation of parity.Thus, the parity calculator 38 recovers the data to be recorded in theLBA to be processed, and stores the same in the buffer area 37R_(i)(step S206).

[0682] The controller 33 then fetches the data stored in the buffer area37R_(i), generates a fifth write request for writing the data in the LBAto be processed, and then enqueues the same to the non-priority queue341A assigned to the spare disk drive (step S207).

[0683] The request selection unit 35A executes the same processing asthat in step S169 (step S208). Therefore, the present fifth writerequest is dequeued from the non-priority queue 341A by the requestselection unit 35A, and transmitted to the SCSI interface 36A. The SCSIinterface 36A processes the received fifth write request, and the diskdrive 41 writes the recovered data in the LBA to be processed (stepS209). In this way, enqueued to the non-priority queue 341A, the presentfifth write request is processed by the disk drive 41A with lowerpriority. When completing write operation, the disk drive 41A transmitsa WRITE-COMPLETED, a signal indicating that writing has been completed,to the controller 33 through the SCSI interface 36A.

[0684] At present, the status of the LBA to be processed is “underreconstruction” in the new managing table. When receiving theWRITE-COMPLETED from the spare disk drive (step S2010), the controller33 updates the status to “normal” (step S2011). After step S2011, thecontroller 33 exits the processing of FIG. 52, thereby bringing theprocessing of one LBA to be processed in step S194 to an end. Thecontroller 33 then determines whether all of the LBA's in the spare diskdrive have been subjected to the processing of step S194 (step S195).The determination in step S195 is based on whether the status“reconstruction-required” set in the new managing table is present ornot. When that status is present, the controller 33 selects the next LBAas the LBA to be processed (step S196), and executes a loop of stepsS192 to S196 until all of the LBA's are subjected to the processing ofstep S194.

[0685] According to the above first reconstruction processing, therequests for data reconstruction (the sixth read request and the fifthwrite request) are enqueued to the non-priority queue. This allows thedisk array device to reconstruct data without affecting processing ofthe high-priority requests (second and fourth process requests).

[0686] Described next is the second reconstruction processing when allof the LBA's in the disk drive 41A are defective. FIG. 53 is a flowchart showing the general procedure of the second reconstructionprocessing. The flow chart of FIG. 53 is different from that of FIG. 51only in that steps S193 and S194 are replaced with steps S211 and S212.Therefore, in FIG. 53, the steps corresponding to the similar steps inFIG. 51 are provided with the same step numbers as those in FIG. 51, andtheir description is omitted herein.

[0687] As in the first reconstruction processing, the faulty disk drive41A is replaced with the spare disk drive. The non-priority queue 341A,the priority queue 342A, the request selection unit 35A, and the SCSIinterface 36A are then assigned to that spare disk drive. Furthermore, anew managing table is created for the spare disk drive.

[0688] The controller 33 next executes steps S191 and S192 to select theLBA to be processed, and then determines whether a predetermined time Thas been elapsed from the previous execution of step S194 or not (stepS211).

[0689] The bandwidth in each of the disk drives 41B to 41D and 41P andthe spare disk drive is limited. Therefore, as the disk array devicetries to execute processing for reconstruction more, the access requestsfrom the host device less tend not to been processed. In step S211, thefrequency of reconstruction processing is determined as once in apredetermined time T, and thereby the array controller 21 controlsadverse effects from the request for reconstruction onto the processingof the access request. The array controller 21 executes the secondreconstruction processing once in the predetermined time T as set. Forexample, assuming the number of LBA's required for reconstruction is “X”and the second reconstruction processing reconstructs the data of “Z”LBA's in “Y” minutes, the second reconstruction processing ends inX/(Z/Y) minutes. Further, the controller 33 generates one request forreconstruction for every Y/Z minutes. That is, T is selected so that Zrequests for reconstruction is generated in Y minutes.

[0690] When determining in step S212 that the predetermined time T hasbeen elapsed, the controller 33 executes the second reconstructionprocessing for the LBA to be processed (step S212). FIG. 54 is a flowchart showing the detailed procedure in step S212. FIG. 54 is differentfrom FIG. 52 only in that steps S202 and S207 are replaced with stepsS221 and S222. Therefore, in FIG. 54, the steps corresponding to thesteps in FIG. 52 are provided with the same step numbers as those inFIG. 52 and their description is simplified herein.

[0691] The controller 33 executes step S201, setting the status of theLBA to be processed to “under reconstruction” and generating fourseventh read requests for reading the data for recovery. The controller33 then enqueues the generated seventh read requests not to the priorityqueue 342A assigned to spare disk drive, but to the priority queues 342Bto 342D and 342P (step S221).

[0692] The request selection units 35B to 35D and 35P execute step S203,and in response thereto, the disk drives 41B to 41D and 41P execute stepS204. Consequently, the seventh read requests are processed by the diskdrives 41B to 41D and 41P with priority. When completing reading, thedisk drives 41B to 41D and 41P transmit the read data for recovery andREAD-COMPLETED's to the SCSI interfaces 36B to 36D and 36P. The SCSIinterfaces 36B to 36D and 36P store the received data for recovery inthe buffer areas 37B_(i) to 37D_(i) and 37P_(i), and transmit thereceived READ-COMPLETED's to the controller 33.

[0693] Then, with the execution of steps S205 and 206, the data to berecorded in the LBA to be processed (the same data recorded in thefaulty disk drive 41A) is recovered.

[0694] The controller 33 then fetches the data stored in the buffer area37R_(i), generating a sixth write request to write the data in the LBAto be processed and enqueuing the same to the priority queue 342Aassigned to the spare disk drive (step S222).

[0695] The request selection unit 35A executes the same processing as instep S169 (step S208). Therefore, the present sixth write request isdequeued from the priority queue 342A by the request selection unit 35Aand transmitted to the SCSI interface 36A. The SCSI interface 36Aprocesses the received sixth write request, and the disk drive 41Awrites the recovered data in the LBA to be processed (step S209). Inthis way, enqueued to the priority queue 342A, the present sixth writerequest is processed by the disk drive 41A with priority. Whencompleting write operation, the disk drive 41A transmits aWRITE-COMPLETED, a signal indicating that writing has been completed, tothe controller 33 through the SCSI interface 36A.

[0696] The controller 33 then executes steps S2010 and S2011, briningthe processing of step S194 to an end. Furthermore, the controller 33executes the loop of steps S192 to S196 until all of the LBA's aresubjected to the processing of step S194.

[0697] According to the second reconstruction, the requests forreconstruction (seventh read request and sixth write request) areenqueued to the priority queues. This can shorten the time the requestwaits to be processed for in the queue managing part 34, therebyensuring the time when the data is fully reconstructed. Furthermore, thearray controller 21 enqueues each request and controls the secondreconstruction processing for each disk drive, thereby effectivelyperforming the second reconstruction processing.

[0698] Described next is how the disk array device operates when thehost device requests access to the LBA “reconstruction-required” or whenthe status of the LBA recording the data blocks for update in FIG. 44 is“reconstruction-required”.

[0699] By referring to the table storage part 39, when reading the datablock, the controller 33 can determine whether the LBA recording thedata block is to be subjected to reconstruction processing or not. Thatis, when the status of the LBA to be accessed is“reconstruction-required”, the controller 33 can recognize that datacannot be read from the LBA. The controller 33 then accesses to thetable storage part 39, changing the status of the LBA be processed to“under reconstruction” and generating read requests for reading the datafor recovery required for recovering the data recorded in the LBA to beprocessed. The controller 33 enqueues the generated read requests to thenon-priority queue or priority queue assigned to the faulty disk drive.If the priority information indicative of “priority” is set in theaccess request from the host device, the controller 33 enqueues the readrequest to the priority queue. If the priority information indicative of“non-priority” is set, the controller 33 enqueues the read request tothe non-priority queue.

[0700] Thereafter, the data for recovery is read from the disk drivesexcept the faulty disk drive, and stored in predetermined buffer areasin the buffer managing part 37. The controller 33 causes the paritycalculator 38 to operate calculation of parity when the entire data forrecovery are stored in the buffer areas, recovering the data to berecorded in the LBA to be processed. With the recovered data, thecontroller 33 continues processing for transmitting the data to the hostdevice, and also generates a seventh write request for writing therecovered data in the LBA to be processed. The seventh write request isenqueued to the non-priority queue assigned to the disk drive includingthis LBA. The controller 33 accesses to the table storage part 39 whenthe recovered data is written in the disk drive, changing the status ofthe LBA to “normal”.

[0701] Described next is how the disk array device operates when writingdata to the LBA “reconstruction-required” in the first or secondreconstruction processing. In this case, the operation is similar tothat described in FIG. 44, except the following two points. First, whenthe controller 33 generates write requests to the disk drive 41A to 41Dand 41P, the controller 33 confirms that the status of the LBA to beaccessed is “reconstruction-required”, and then changes the status to“under reconstruction”. Second, when the disk drive including the LBA“under reconstruction” completes writing, the controller 33 changes thestatus of the LBA to “normal”.

[0702] As described above, when the host device requests access to theLBA “reconstruction-required” in the newly-created managing table, thedisk array device writes the data recovered with calculation of parityin the LBA. The write request for this writing is enqueued to thenon-priority queue. Therefore, the recovered data is written in the diskarray 22 with lower priority together with the access request from thehost device. As described above, the LBA “reconstruction-required” issubjected to the first or second reconstruction processing. However, thefirst and second reconstruction processings are executed in parallel,decreasing the number of LBA “reconstruction-required” in eitherprocessing. This shorten the time required for the first or secondreconstruction processing. Furthermore, since the seventh write requestis enqueued to the non-priority queue, it can be ensured that writing ofthe recovered data does not affect other processing with higher priorityto be executed by the disk array device.

[0703] When the host device requests access to the LBA“reconstruction-required” for writing the data, the controller 33changes the status of the LBA to “normal when the disk array devicecompletes writing. Therefore, the disk array device is not required toexecute unnecessary reconstruction processing, and the processing timein the disk array device can be shortened.

[0704] Further, although the disk array device is constructed based onthe RAID-3 and RAID-4 architecture in the present embodiment, the diskarray device may have the RAID-5 architecture. Furthermore, the presentembodiment can be applied even to the disk array device with the RAID-1architecture.

[0705] Still further, although the disk array device includes one diskgroup in the present embodiment, the disk array device may include aplurality of disk groups. Moreover, although the host device specifiespriority using the LUN in the present embodiment, information indicativeof priority may be added to the LUN and higher priority is given to therequest if the first bit of the LUN is “1”.

[0706] Still further, although two levels of priority are defined in thedisk array device according to the present embodiment, more than threelevels of priority may be defined. In this case, the number of queuesare determined according to the number of levels of priority. In thiscase, the request generated in the first reconstruction processing ispreferably enqueued to a queue with lower priority than a queue to whicha request for non-real-time data is enqueued. The first reconstructionprocessing is thus executed without affecting processing ofnon-real-time data. On the other hand, the request generated in thesecond reconstruction processing is preferably enqueued to a queue withhigher priority than a queue to which a request for real-time data isenqueued. The second reconstruction processing is thus executed withoutbeing affected by the processing of real-time data and non-real timedata, and thereby the end time of the second reconstruction processingcan be ensured more.

[0707] Still further, when the host device always requests processingexclusively for either of real-time data or non-real-time data, it isnot required to set priority information in the access request, and thusthe request rank identifying part 32 is not required. Further, althoughthe first and second reconstruction processings are independentlyexecuted in the present embodiment, if these are executedsimultaneously, more effective reconstruction can be achieved withensuring its end time.

Ninth Embodiment

[0708] In a ninth embodiment, as in the previous embodiments, real-timedata is data to be processed in real time in the disk array device.

[0709]FIG. 55 is a block diagram showing the structure of a disk arraydevice 51 according to the ninth embodiment of the ninth embodiment. InFIG. 55, the disk array device 51 is constructed by the architecture ofa predetermined RAID level, including a disk group 61 and a diskcontroller 71. The disk array device 51 is communicably connected to ahost device 81 placed outside.

[0710] The disk group 61 is typically composed of a plurality of diskdrives 62. A logical block address (LBA) is previously assigned to eachrecording area of each disk drive 62. Each disk drive 62 manages its ownentire recording areas by block (generally called sector) of apredetermined fixed length (generally 512 bytes). Each disk drive 62reads or writes redundant data (that is, sub-segment and parity). Notethat only one disk drive 62 can compose the disk group 61.

[0711] The disk controller 71 includes a host interface 72, a read/writecontroller 73, a disk interface 74, and a reassignment part 75. The hostinterface 72 is an I/O interface between the disk array device 51 andthe host device 81, structured conforming to SCSI (Small Computer SystemInterface) in the present embodiment. SCSI is described in detail inJapan Standards Association X6053-1996 and others, but is not directlyrelated to the present invention, and therefore its detailed descriptionis omitted herein. The read/write controller 73, communicably connectedto the host interface 72, controls reading or writing of the redundantdata over the disk group 61 according to the I/O request SR from thehost device 81. The disk interface 74, communicably connected to theread/write controller 73, is an I/O interface between the diskcontroller 71 and the disk group 61. In the present embodiment, thisinterface is also conforms to SCSI.

[0712] The reassignment part 75 is a component unique to the presentdisk array device 51, communicably connected to the disk interface 74.The reassignment part 75 monitors delay time calculated from apredetermined process start time, and by referring to first and secondlists 751 and 752 created therein, finds the disk drive 62 having adefective (faulty) area and instructs to that disk drive 62 to executeprocessing of assigning an alternate area to the defective area(reassign processing).

[0713] Described next is the general outlines of input/output of databetween the host device 81 and the disk array device 51. The host device81 transmits an I/O request signal SR to the disk array device 51 torequest for inputting/outputting real-time data. The host device 81 andthe disk array device 51 may communicate a plurality pieces of real-timedata simultaneously. The host device 81 requests forinputting/outputting the real-time data by data (segment data) of apredetermined size which the plurality pieces of data are divided into.This allows the disk array device to input/output the plurality piecesof real-time data in parallel. This parallel processing contributes toinput/output of data in real time.

[0714] For example, when requesting input/output of first and secondreal-time data, the host device 81 first transmits an I/O request SR 1for one segment composing the first real-time data, and then an I/Orequest SR 2 for one segment composing the second real-time data, andthis operation is repeated in the disk array device. In other words, thesegments of each real-time data are regularly processed so that onesegment of the first real-time data and one segment of the secondreal-time data are alternately processed.

[0715] Described next is the operation of the read/write controller 73in the disk array device 51 with reference to a flow chart of FIG. 56.The read/write controller 73 receives an I/O request SR from the hostdevice 81 through the host interface 72 (step S231). This I/O request SRspecifies the recording area of one segment, generally using the LBA.The read/write controller 73 then converts the I/O request SR accordingto the RAID architecture to generate an I/O request SSR for eachsub-segment. The relation between a segment and a sub-segment is nowdescribed. A segment is divided into a plurality of sub-segmentsaccording to the RAID architecture, and these sub-segments aredistributed over the disk drives 62. Further, the sub-segments may bemade redundant in the disk controller 71 to cope with failure of onedisk drive 62 according to the level of the RAID. Furthermore, paritygenerated in the disk controller 71 may be recorded only in one diskdrive 62.

[0716] The read/write controller 73 transmits an I/O request SSR foreach sub-segment to each of the disk drives 62 through the diskinterface 74 (step S232). At this time, the read/write controller 73transmits an I/O request for parity, as required. The interface betweenthe disk controller 71 and the disk group 61 conforms to SCSI, and thesub-segments are recorded in successive LBA area in the disk drive 62.Therefore, the read/write controller 73 is required to generate only oneSCSI command (READ or WRITE) as the I/O request SSR of thesesub-segments. The I/O request SSR specifies the successive LBA area.These steps S231 and S232 are executed whenever an event of receiving anI/O request occurs.

[0717] Each disk drive 62 accesses to the successive LBA area specifiedby the I/O request SSR to read or write the sub-segments. When readingor writing ends normally, the disk drive 62 returns a response RES tothe received I/O request SSR to the disk controller 71. The read/writecontroller 73 receives the response RES from each disk drive 62 throughthe disk interface 74. When the host device 81 requests write operation,the read/write controller 74 notifies the host device 81 through thehost interface 72 that writing has been completed. When the host device81 requests read operation, the read/write controller 74 transmits allof the read sub-segments at once as a segment to the host device 81.

[0718] The sub-segments are recorded in the successive LBA area in eachdisk drive 62, thereby being successively transmitted in real timebetween the disk controller 71 and each disk drive 62. In other words,overhead (typically, seek time plus rotational latency) in each diskdrive 62 is within a range of a predetermined time T₁ during whichinput/output in real time is not impaired. However, in the conventionaldisk array device, reassign processing is executed by each fixed-blocklength in the disk drive, and therefore a fixed-block in part of thesuccessive LBA area may be subjected to reassign processing. As aresult, even if the sub-segments after reassignment are recorded in thesuccessive LBA area, the physical recording areas of the sub-segmentsare distributed over the disk drive (fragmentation of sub-segments), andthe overhead in the disk drive 62 become long. As a result, thecapability of input/output in real time in the conventional disk arraydevice is impaired after reassignment. Therefore, the reassignment part75 in the present disk array device 51 executes processing of flowcharts shown in FIGS. 57 to 59 to maintain its capability forinput/output in real time.

[0719] The disk interface 74 transmits a signal “transmissionnotification” to the reassignment part 75 whenever the disk interface 74transmits the I/O request SSR to the disk drive 62. This transmissionnotification includes the ID specifying the transmitted I/O request SSR,and the successive LBA area specified by the I/O request SSR. Thereassignment part 75 executes the flow chart of FIG. 57 whenever itreceives such transmission notification. Here, assume that thereassignment part 75 receives the transmission notification includingthe ID “b” and the successive LBA area “a”, and that this transmissionnotification is generated due to the I/O request SSR 1. The reassignmentpart 75 has a time-of-day clock, detecting a receive time T_(T1) (thatis, transmission time of the I/O request SSR 1) when the transmissionnotification is received. The reassignment part 75 also extracts the ID“b” and the successive LBA area “a” from the transmission notification(step S241).

[0720] The reassignment part 75 creates and manages a first list 751 anda second list 752 therein. The first list 751, created for each diskdrive 62, includes, as shown in FIG. 60(a-1), fields of the ID, LBA(successive LAB area) and process start time. In the first list 751, theID, LBA and process start time are registered for each I/O request SSRtogether with the transmission order of the I/O requests to thecorresponding disk drive 62. The order of transmitting the I/O requestsis indicated by an arrow in FIG. 60(a-1). As indicated by an arrow, theinformation on a new I/O request is registered in the first list 751located frontward, while the information on an old I/O request isregistered in the first list 751 located backward. The second list 752includes, as shown in FIG. 60(b-1), fields of the successive LBA area inwhich the sub-segment is stored and the counter. In the second list 752,the successive LBA area and the counter value of the counter areregistered.

[0721] After step S241, the reassignment part 75 determines whetherplural I/O requests SSR have been sent to the target disk drive 62 (thatis, target disk drive of the present I/O request SSR) (step S242). Thefirst lists 751 includes only the transmitted I/O requests SSR for eachdisk drive 62. The reassignment part 75 refers to these first lists 751for determination in step S242.

[0722] When determining that plural I/O requests are not present in thetarget disk 62, the reassignment part 75 registers the successive LBAarea “a” and the ID “b” in the first list 751 extracted in step S241,and also registers the transmission time T_(T1) detected in step S241 asthe process start time in the first list 751 (step S243). As a result,information as shown in FIG. 60(a-2) is registered in the first list 751for the present I/O request SSR.

[0723] When it is determined that plural I/O requests are present, notonly the present I/O request SSR but also at least one other I/O requesttransmitted immediately before the present one has been sent to thetarget disk drive 62. In this case, the process start time for thepresent I/O request is the time when the reassignment part 75 receives aresponse to the immediately preceding I/O request (described later indetail).

[0724] When the event “transmission notification received” occurs, theprocessing in step S241 is executed. Therefore, the flow chart of FIG.57 is event-driven. In addition to the procedure shown in FIG. 57, thereassignment part 75 also executes the procedure shown in the flow chartin FIG. 58 during operation of the disk array device 51. Thereassignment part 75 monitors whether the delay time T_(D) exceeds thelimit time T₁ for the ID recorded in each first list 751 (that is, eachI/O request SSR) to detect a defective recording area (step S251). Notethat, in step S251, the reassignment part 75 does not monitor for theI/O request SSR in which the process start time has not yet beenregistered. The delay time T_(D) is the time between the registeredprocess start time and the present time T_(P). Predetermined in thepresent disk array device 51, the limit time T_(L) is an indicator fordetermining whether successive LBA area in the disk drive 62 includes adefective fixed-block and also for determining whether input/output ofthe sub-segment in real time can be satisfied. That is, when the delaytime T_(D) exceeds the limit time T₁, the reassignment part 75 assumesthat the successive LBA area may possibly include a defectivefixed-block.

[0725] Described next is the processing in step S251 in detail, takingthe ID “b” for example. In the first list 751 (refer to FIG. 60(a-2),the I/O request SSR 1 is specified by the ID “b”, and its delay timeT_(D1) therefore can be given by T_(P)−T_(T1). When T_(D1)>T_(L) issatisfied, the procedure advances to step S252 when not satisfied, thereassignment part 75 executes the processing in step S251 again to findthe ID for reassignment. Note again that, in step S251, the reassignmentpart 75 does not monitor for the I/O request SSR in which the processstart time has not yet been registered.

[0726] When determining in step S251 that T_(D1)>T_(L) is satisfied forthe ID “b”, the reassignment part 75 instructs the disk interfacecontrol part 74 to terminate execution of the I/O request SSR 1specified by the ID “b” (step S252). In response to this instruction,the disk interface 74 transmits a ABORT_TAG message, which is one of theSCSI messages, to terminate execution of the I/O request SSR 1. The diskinterface 74 then notifies the read/write controller 73 that theprocessing of the I/O request SSR 1 has been failed. In response, theread/write controller 73 executes the processing, which will bedescribed later.

[0727] After step S252, the reassignment part 75 checks whether anotherI/O request SSR waits to be processed in the disk drive 62 which hasterminated execution of the I/O request SSR 1 by referring to the firstlist 751 (step S253). Since the first list 751 is created for each diskdrive 62, the reassignment part 34 determines that another I/O requestSSR waits if the ID other than “b” is registered. The process start timeof the other I/O request SSR has not yet been registered in the firstlist 751. Therefore, when finding the ID other than the ID “b” in thefirst list 751, as shown in FIG. 60(a-3), the reassignment part 75registers the present time as the process start time for the I/O requestto be processed following the I/O request SSR 1 (step S254). On theother hand, when the reassignment part 75 does not find another ID instep S253, the procedure skips step S254 to step S255.

[0728] The reassignment part 75 then fetches the successive LBA area “a”from the first list 751 by referring to the ID “b”. The reassignmentpart 75 then determines whether the counter is created for thesuccessive LBA area “a” to check whether it is successively determinedthat there is a high possibility of including a defective fixed-block inthe successive LBA area “a” (step S255). The counter value N, indicatinghow many times T_(D)>T_(L) is successively satisfied, is registered inthe field of the counter in the second list 752. Since the second list752 is created for every successive LBA area, if the counter has beencreated, it was determined in the previous check that there is a highpossibility of including a defective fixed-block in the correspondingsuccessive LBA area (that is, it has been successively determined thatT_(D)>T_(L) is satisfied). On the other hand, if the counter has notbeen created, it is determined for the first time that there is a highpossibility of including a defective fixed-block in the successive LBAarea. Here, assuming that the counter has not been created for thesuccessive LBA area “a”, the reassignment part 75 newly creates thesecond list 752, registering “a” for the successive LBA area and “1” forthe corresponding counter, as shown in FIG. 60(b-2) (step S256). When itis determined in step S255 that the counter has been created, theprocedure advances to step S259.

[0729] After step S256, the reassignment part 75 next determines whetherthe counter value N reaches the limit value N_(L) or not (step S257).The limit value N_(L) is predetermined in the present disk array device51. When the counter value N reaches the limit value N_(L), the limitvalue N_(L) becomes a predetermined threshold for determining that allor part of the fixed-blocks in the successive LBA area is defective. Thelimit value N_(L) is a natural number of 1 or more, determined in viewof input/output in real time according to the specifications of thepresent disk array device 51. In the present embodiment, assume that “2”is selected for the limit value N_(L). Since the counter value N of thesuccessive LBA area “a” is “1” (refer to FIG. 60(b-2)), the procedureadvances to step S258. When the counter value N exceeds the limit valueN_(L), the procedure advances to step S2510, which will be describedlater.

[0730] The reassignment part 75 deletes the ID “b”, the successive LBAarea “a”, and the process start time “T_(T1)” from the first list 751(step S258). This processing prevents the counter for the I/O requestSSR 1 specified by the ID “b”, the successive LBA area “a”, and theprocess start time “T_(T1)” from being redundantly incremented. Notethat the successive LBA area “a” and the counter value N in the secondlist 752 are not deleted. Therefore, when another I/O request specifiesthe successive LBA area “a”, it is also correctly checked whether thissuccessive LBA area “a” includes a defective fixed-block. That is, ifthe successive LBA area “a” and the counter value N in the second list752 are deleted, it cannot be determined whether the counter value Nreaches the limit time N_(L) or not, and therefore reassign processingcannot be executed correctly.

[0731] As described above, a response RES 1 to the I/O request SSR 1returns from the disk drive 62 through the disk interface 74 to theread/write controller 73. The response RES 1 includes the successive LBAarea “a”, information indicative of read or write, and the ID of the I/Orequest SSR 1 “b”. The disk interface 74 transmits a receivenotification to the reassignment part 75 whenever the disk interfacereceives the response RES to each I/O request SSR. In response to thereceive notification, the reassignment part 75 executes the processingin steps S261 to S267 shown in FIG. 59, which will be described later.

[0732] When the response RES 1 indicates that writing has been failed,the read/write controller 73 generates an I/O request SSR 1′ includingthe same information as the I/O request SSR 1 to retry to register thesub-segment in the successive LBA area “a”, and then transmits the sameto the disk drive 62. When the response RES 1 indicates that reading hasbeen failed, the read/write controller 73 recovers the unreadsub-segment or retries to register the sub-segment as described above,by using parity and other sub-segments according to the RAIDarchitecture.

[0733] The disk interface 74 transmits a transmission notification ofthe I/O request SSR 1′ to the reassignment part 75. This transmissionnotification includes the ID “c” and the successive LBA area “a”. Thereassignment part 75 detects the receive time of the transmissionnotification (the process start time T_(T1)′ of the I/O request SSR 1′)and also extracts the ID “c” and the successive LBA area “a” from thereceive notification (step S241 of FIG. 57).

[0734] The reassignment part 75 then refers to the first list 751 todetermine whether plural I/O requests SSR have been sent to the targetdisk 62 (the destination of the I/O request SSR 1′) or not (step S242)If one I/O request SSR, that is, only the I/O request SSR 1′, has beensent, the reassignment part 75 registers the successive LBA area “a”,the ID “c”, and the process start time T_(T1)′ obtained in step S241 inthe first list 751 (step S243), and then ends the processing of FIG. 57.As a result, the first list 751 becomes as such shown in FIG. 60(a-4).On the other hand, if another I/O request SSR other than the I/O requestSSR 1′ has been sent, the reassignment part 75 registers only thesuccessive LBA area “a” and the ID “c” extracted in step S241 (steps244), and then ends the processing of FIG. 57. In this case, the firstlist 751 becomes as such shown in FIG. 60(a-5).

[0735] When the processing of FIG. 57 ends, the reassignment part 75executes the flow chart of FIG. 58. When T_(D1)′ (the present timeT_(P)—the process start time T_(T1)′) exceeds the limit time T_(L) as tothe registered process start time T_(T1)′, the reassignment part 75executes the above described processing of steps S252 to S254, and theirdescription is omitted herein. The reassignment part 75 then checkswhether the counter is created for the successive LBA area “a”corresponding to the process start time T_(T1)′ (step S255). In thepresent second list 752, as shown in FIG. 60(b-2), the counter iscreated for the successive LBA area “a”, and therefore it is determinedthat there is a high possibility of including a defective fixed-block atprevious check (that is, at the time of transmission of the I/O requestSSR 1). Therefore, the reassignment part 75 increments the counter valueN by “1”, as shown in FIG. 60(b-2) (step S259).

[0736] As described above, assume herein that the limit time N_(L) is“2”. Since the counter value N is “2” at present, the reassignment part75 determines that the successive LBA area “a” includes a defectivefixed-block, instructing reassignment in step S257. The reassignmentpart 75 produces a REASSIGN_BLOCKS command (refer to FIG. 61), which isone of the SCSI commands, for specifying the successive LBA areaincluding the defective fixed-block. The reassignment part 75 specifiesthe successive LBA area “a” in a defect list of the REASSIGN_BLOCKScommand. The reassignment part 75 transmits the REASSIGN_BLOCKS commandthrough the disk interface 74 to the disk drive 62, instructingreassignment (step S2510).

[0737] As the alternate area, the disk drive 62 assigns a fixed-blockhaving a physical address which allows successive data transmission tothe successive LBA area specified by the REASSIGN_BLOCKS command, andthen returns an affirmative response ACK 1, a signal indicative of theend of reassignment, to the disk controller 71. As is the case in thepresent embodiment, when the disk controller 71 instructs the disk drive62 with the REASSIGN_BLOCKS command to execute reassignment, thephysical address to which the sub-segment is reassigned is changed inthe disk drive 62 after reassignment, but the logical block address(LBA) to which the sub-segment is reassigned is not changed even afterreassignment. Therefore, the disk controller 71 does not have to storethe new LBA for the sub-segment after reassignment.

[0738] Described next is the physical address of the alternate recordingarea which allows successive data transmission in the disk drive 62.With such physical address, the above described overhead can beshortened so as to satisfy input/output in real time. Examples of thealternate recording areas in the disk drive 62 (that is, eachfixed-block composing the successive LAB area specified by theREASSIGN_BLOCKS command) are as follows:

[0739] 1. Fixed-blocks whose physical addresses are close to each other;

[0740] 2. Fixed-blocks having successive physical addresses;

[0741] 3. Fixed-blocks on the same track (or cylinder);

[0742] 4. Fixed-blocks on tracks close to each other; and

[0743] 5. Fixed-blocks on the track (or cylinder) close to the track (orcylinder) with the defective block assigned thereto.

[0744] When the successive LBA area including such fixed block as listedabove is specified, the disk drive 62 can, as a natural consequence,successively transmit the requested sub-segment in real time to the diskcontroller 71.

[0745] With the affirmative response ACK 1, the disk drive 62 notifiesthe disk controller 71 of the end of reassignment. When receiving theaffirmative response ACK 1, the host interface 72 transfers the same tothe reassignment part 75 and the read/write controller 73. When thereassignment part 75 receives the affirmative response ACK 1, theprocedure advances from step S2510 to step S2511. Since the successiveLBA area “a” included in the affirmative response ACK 1 has beenreassigned, the reassignment part 75 deletes the successive LBA area “a”and the counter value from the second list 752 (step S2511), and alsodeletes the first list 751 including the successive LBA area “a”, the ID“c”, and the process start time T_(T1)′ (step s2512). The procedure thenreturns to step S251.

[0746] On receiving the affirmative response ACK 1, the read/writecontroller 73 instructs the disk drive 62 subjected to reassignment towrite the sub-segment when the I/O request SSR 1′ requests writeoperation. When the I/O request SSR 1′ requests read operation, theread/write controller 73 recovers the sub-segment lost by reassignmentusing parity and other sub-segments according to the RAID architecture,and then transmits the recovered sub-segment to the host device 81through the host interface 72 and also instructs the disk drive 62through the disk interface 74 to write the recovered sub-segment. Thus,the recorded data in the disk drive 62 can maintain consistency beforeand after reassignment.

[0747] As described above, the essentials of the present disk arraydevice are timing of reassignment and physical address of the alternatearea. For easy understanding of these essentials, the operation of thereassignment part 75 has been described above with some part omittedwhen the response RES 1 is received by the. array controller 2. That is,when the response RES 1 returns to the disk controller 71, the contentsof the first list 751 vary according to the return time of the responseRES 1 and the type of the response RES (read or write). Described belowis the operation of the reassignment part 75 when the response RES 1returns to the disk controller 71.

[0748] The disk interface 74 generates a signal “receive notification”whenever it receives the response RES to the I/O request SSR, andtransmits the same to the reassignment part 75. This receivenotification includes the ID and successive LBA area of the I/O requeston which the received response RES is based. The reassignment part 75executes the flow chart of FIG. 59 whenever it receives a receivenotification. Now, assume herein that the disk interface 74 generatesthe receive notification on receiving the response. RES 1 and transmitsthe same to the reassignment part 75. The response RES 1 includes, asevident from above, the ID “b”, the successive LBA information “a” andthe information on whether read or write. Note that the information onwhether read or write is not required for the reassignment part 75.Therefore, the receive notification only includes the ID “b” and the LBA“a”.

[0749] The reassignment part 75 checks whether the ID “b” has beenregistered in the first list 751 or not (step S261). If the ID “b” hasnot been registered in the first list 751 even though the I/O requestSSR 1 has been transmitted, that means that the ID “b”, the successiveLBA area “a”, and the process start time “T_(T1)” were deleted in stepS258 or S2512 of FIG. 28. Therefore, not required to change (update ordelete) the first list 751, the reassignment part 75 ends the processingof FIG. 58.

[0750] On the other hand, in step S261, if the ID “b” has beenregistered in the first list 751, that means that T_(D1)>T_(L) has notbeen satisfied in step S251 (refer to FIG. 58) until the receivenotification is received (that is, the response RES is returned).Therefore, the reassignment part 75 determines whether T_(D1)>T_(L) issatisfied at present in the same manner as step S251 (step S262). Whenthe delay time T_(D1) exceeds the limit time T_(L), it is required todetermine whether the reassignment should be instructed or not, andtherefore the procedure advances to steps S253 of FIG. 58 andthereafter, as shown by A in FIG. 59.

[0751] On the other hand, when the delay time T_(D1) does not exceed thelimit time T_(L), that means that the response RES 1 has been receivedby the disk controller 71 before a lapse the limit time T_(L). That is,the successive LBA area “a” does not include a defective fixed-block.Therefore, the reassignment part 75 checks whether the counter iscreated for the successive LBA area “a” in the second list 752 (stepS263). If the counter has been created, the reassignment part 75executes the step S265 to delete the ID “b” and the process start time“T_(T1)” (step S264). On the other hand, if the counter has not beencreated yet, the reassignment part 75 deletes only the ID “b” and theprocess start time “T_(T1)” from the first list 751 (step S265).

[0752] The reassignment part 75 determines whether the I/O request SSRhas been sent to the target disk drive 62 (the disk drive 62 fortransmitting the present response RES 1) or not (step S266). In thefirst list 751, the I/O request SSR transmitted to the target disk drive62 is written. The reassignment part 75 can make determination in stepS266 by referring to the first list 751. When the I/O request ispresent, as shown in FIG. 60(a-5), the first list 751 includes the IDand the successive LBA area of the present I/O request registeredtherein, but does not include the process start time. Therefore, thereassignment part 75 registers the present time as the process starttime of the I/O request SSR to be processed next in the disk drive 62(step S267), and then ends the processing of FIG. 59. The present timeis the time when a response RES to one I/O request SSR returns from thedisk drive 62 to the disk controller 71, and is also the time when thedisk drive 62 starts processing of the I/O request SSR sent next. Thatis, the present time as the process start time is the time whenprocessing of the I/O request SSR to the disk drive 62 starts.

[0753] In some cases, the reassignment part 75 may erroneously determinethat there is a possibility of including a defective fixed-block in thesuccessive LBA area “a” due to thermal aspiration, thermal calibration,and others occurred in the disk drive 62, creating a counter, eventhough the successive LBA area “a”, in fact, does not include adefective fixed-block but is composed of normal fixed blocks. If theinformation on the successive LBA area “a” composed of normal fixedblocks has been registered in the first list 751 for a long time, thereassignment part 75 may instruct unnecessary reassignment. In stepS264, if the counter has been created, that means that the reassignmentpart 75 determines that there is a possibility of including a defectivearea in the successive LBA area “a”. Therefore, the reassignment part 75deletes the successive LBA area “a” and the counter value N from thesecond list 752 (step S264), and then executes steps S265 to S267 to endthe processing of FIG. 59.

[0754] As described above, according to the present embodiment, thereassignment part 75 in the disk controller 71 monitors the delay timeT_(D) of the response RES to each I/O request SSR from the process starttime of each I/O request SSR, determining whether to instruct the diskdrive 62 to execute reassignment based on the calculated delay timeT_(D). Here, the process start time is the time when each I/O requestSSR is transmitted to each disk drive 62 if the number of I/O requestsSSR sent to each disk drive is 1. When plural I/O requests SSR are sentto each disk drive, the process start time is the time when the diskcontroller 71 receives the response RES to the I/O request SSR to beprocessed immediately before the present I/O request SSR. By controllingreassign timing in this manner, even if the recording area of thesub-segment is accessible with several retries by the disk drive, thereassignment part 75 assumes that its delay in response becomes large(that is, input/output in real time cannot be satisfied), and instructsexecution of reassignment. That is, the disk array device 51 caninstruct execution of reassignment in such timing as to suppress a delayin response.

[0755] Further, a long delay in the response RES to one I/O request SSRaffects processing of the following I/O requests SSR to be processed.That is, a delay in response to the following I/O requests SSR to beprocessed occurs in the same disk drive 62, causing adverse effects thattransmission of the following responses RES in real time cannot besatisfied. Therefore, the reassignment part 75 monitors the delay timeT_(D) of the I/O request SSR, and, when the delay time T_(D) exceeds thelimit time T_(L), terminates execution of processing of the I/O requestSSR. Thus, even if processing of one I/O request is delayed, such delaydoes not affect processing of the following I/O requests SSR.

[0756] Still further, the reassignment part 75 in step S251 of FIG. 58determines whether the successive LBA area includes a defectivefixed-block or not, using a criterion T_(D)>T_(L). The reassignment part75, however, does not instruct reassignment immediately afterdetermining that T_(D)>T_(L) is satisfied, but instructs using aREASSIGN-BLOCKS command after successively determining for apredetermined number of times that T_(D)>T_(L) is satisfied. Thus, evenif it is erroneously and sporadically determined due to thermalaspiration, thermal calibration, and others that the successive LBAarea, which in fact includes only normal blocks, includes a defectiveblock, the reassignment part 75 can prevent unnecessary reassigninstruction. Note that, if unnecessary reassign instruction is not takeninto consideration, the limit number N may be “1”.

[0757] Still further, when instructing reassignment, the reassignmentpart 75 transmits a REASSIGN_BLOCKS command indicating all successiveLBA areas in defect lists (refer to FIG. 61). The disk drive 62 assignsan alternate recording area having the physical address allowingsuccessive data transmission to the successive LBA area specified by theREASSIGN_BLOCKS command. Thus, the present disk array device 51 does notdegrade its capability before and after executing reassignment, alwaysallowing input/output in real time without a delay in response.

[0758] Still further, when the I/O request SR requests read operation,the read/write controller 73 recovers the unread sub-segment afterassignment according to the RAID architecture. The recovered sub-segmentis written in the alternate recording area (successive LBA area). On theother hand, when the I/O request SR requests write operation, theread/writ controller 73 transmits the I/O request SSR to write thesub-segment in the alternate recording area (successive LBA area) afterreassignment. The LBA of that sub-segment is not changed before andafter reassignment. Thus, the disk array device 51 can maintainconsistency in the sub-segment recorded in the disk group 61 before andafter reassignment.

[0759] In the present embodiment, for simple and clear description,other successive LBA area, ID, process start time, and counter have notbeen described, but such information for many successive LBA areas areactually registered in the first list 751 and the second list 752.Furthermore, in the actual disk array device 51, the read/writecontroller 73 may transmit plural I/O requests SSR to one sub-segment.In this case, for the successive LBA area with that sub-segment recordedtherein, a plurality of sets of the ID, the successive LBA area, andprocess start time are registered in the first list 751.

[0760] Furthermore, in the present embodiment, the reassignment part 75instructs execution of reassignment. However, if each disk drive 62executes the conventional reassign method such as auto-reassignindependently of the reassignment part 75, the capability ofinput/output in real time in the entire disk array device 51 can befurther improved.

Tenth Embodiment

[0761]FIG. 62 is a block diagram showing the structure of a disk arraydevice 91 according to a tenth embodiment of the present invention. InFIG. 62, the disk array device 91 is constructed according the RAIDarchitecture of a predetermined level, including a disk group 1001 and adisk controller 1101. Furthermore, the disk array device 91 iscommunicably connected to the host device 81 as in the first embodiment.Since the disk array device 91 shown in FIG. 62 partially includes thesame components as those in the disk array device 51 shown in FIG. 55,the corresponding components in FIG. 62 are provided with the samereference numbers as those in FIG. 55, and their description is omittedherein.

[0762] The disk group 1001 is constructed of two or more disk drives. Alogical block address is previously assigned to each recording area ineach disk drive. Each disk drive manages its own recording areas by aunit of block (typically, sector) of a predetermined fixed length(normally, 512 bytes). In the present embodiment, the disk drives in thedisk group 1001 are divided into two groups. Disk drives 1002 of onegroup are normally used for data recording, reading and writing the data(sub-segment and parity), like the disk drives 62 shown in FIG. 55. Aspare disk drive 1003 of the other group is used when the alternateareas in the disk drives 1002 become short. The spare disk drive 1003 isused as the disk drive 1002 for recording data after the data recordedin the disk drive 1002 is copied thereto.

[0763] The disk controller 1101 includes the same host interface 72 anddisk interface 74 as those in the disk controller 71 of FIG. 55, aread/write controller 1102, a reassignment part 1103, a first storagepart 1104, a count part 1105, a second storage part 1106, an addressconversion part 1107, and a non-volatile storage device 1108. Theread/write controller 1102 is communicably connected to the hostinterface 72, controlling read or write operation on a sub-segmentaccording to an I/O request SR from the host device 81. The read/writecontroller 1102 controls read or write operation in cooperation with theaddress conversion part 1107. The reassignment part 1103 is communicablyconnected to the disk interface 74, executing reassign processing. Thereassignment part 1103 creates the first list 751 and the second list752 similar to those in the reassignment part 75 of FIG. 55, determiningtiming of start reassign processing. The reassignment part 1103 isdifferent from the reassignment part 75, however, in that thereassignment part 1103 assigns an alternate recording area to adefective recording area by referring to alternate area information 1109stored in the first storage area 1104. Furthermore, the reassignmentpart 1103 counts up the count part 1105 to count the used amount (or theremaining amount) of the alternate areas whenever the reassignment part1103 assigns an alternate area. The address conversion part 1107operates calculation according to the RAID architecture whenever thereassignment part 1103 assigns an alternate area, uniquely drawing theoriginal recording area (LBA) and the current recording area (LBA) ofthe data. The address conversion part 1107 then stores the drawnoriginal recording area and the current recording area as addressinformation 11110 in the second storage part 1106 for each disk drive1002. The non-volatile storage device 1108 will be described last in thepresent embodiment.

[0764] Described briefly next is the operation of the disk array device91 on initial activation. In the disk group 1001, a defectivefixed-block may already be present in the recording area of one diskdrive 1002 or 1003 on initial activation. Further, there is apossibility that an unsuitable recording area for “successive datatransmission” as described in the ninth embodiment may be present in onedisk drive 1002 or 1003 due to this defective fixed-block. When theunsuitable area is used as the alternate area, input/output in real timeis impaired. Therefore, the disk array device 91 executes processingdescribed in the following on initial activation, detecting thedefective fixed-block and also the recording area unsuitable as thealternate area.

[0765] On initial activation, the disk controller 1101 first reservespart of the recording areas included in each disk drive 1102 and eachspare disk drive 1103. The disk controller 1101 generates the alternatearea information 1109, and stores the same in the first storage part1104. In FIG. 63, the first storage area 1104 manages the alternateareas reserved for each disk drive 1102 or 1103 by dividing thealternate areas into the size of sub-segment. The divided alternateareas are used as the alternate areas. Typically, each alternate area isspecified by the first LBA. Further, the disk controller 1101 reservespart of the recording areas in each disk drive 1002 or 1003 as not onlythe alternate areas but also system areas. As a result, the sub-segmentsand parity are recorded in the recording areas other than the alternateareas and the system areas in each disk drive 1002 and 1003.

[0766] Each alternate area is used only after reassign processing isexecuted. A sub-segment or parity is not recorded in the alternate areaunless reassign processing is executed. The system areas are areas whereinformation for specifying the alternate area (that is, the sameinformation as the alternate area information 1109), and the sameinformation as the address information 11110 are recorded. Like thealternate areas, the system areas are managed so that a sub-segment orparity is not recorded therein. When the present disk array devise 91 isagain powered on after initial activation, the information recorded inthe system area of each disk drive 1002 is read into the first storagepart 1104 or the second storage part 1106, and used as the alternatearea information 1109 or the address information 11110.

[0767] Further, on initial activation, the recording areas in each diskdrive 1002 or 1003 is checked whether each block in the size of thesub-segment is suitable for successive data transmission or not, thatis, checked whether the recording area in the size of the sub-segmentincludes a defective fixed-blocks or not. In the recording area which isdetermined to include a defective fixed-block through this check, thesystem area and the alternate area information 1109 are updated so thatthe determined recording area is not used as the alternate area and thesub-segment or parity is not recorded therein. An alternate area isassigned to the recording area including the defective block. When it isdetermined that the recording area reserved as the alternate areaincludes a defective fixed-block through the check, the LBA of therecording area is deleted from the alternate area information 1109. Suchcheck is executed through the following procedure, which is described inJapan Standards Association X6053-1996 and others, and therefore will bebriefly described herein.

[0768] The disk controller 1101 first transmits a READ_DEFFECT_DATAcommand, one of the SCSI commands, to each disk drive 1002 or 1003 toextract a defect descriptor indicative of the defective areainformation. The disk controller 1101 extracts information on thedefective LBA from the defect descriptor by using SCSI commands such asa SEND_DIAGONOSTIC command and a RECEIVE_DIAGONOSTIC_RESULTS command.The disk controller 1101 determines that the recording area includingthe defective LBA (defective fixed-block) is unsuitable for successivedata transmission.

[0769] The above check is periodically executed to the recording area ofthe sub-segment or parity in each disk drive 1002 or 1003 even duringthe operation of the disk array device 91. When the defective area isdetected through this check, an alternate area is assigned to thedefective area.

[0770] Described next is the operation to be executed by the read/writecontroller 1102 with reference to a flow chart of FIG. 64. The hostdevice 81, as is in the same manner as in the ninth embodiment,specifies the LBA of the segment by the I/O request SR to request thedisk array device to execute read or write operation. Note that the LBAspecifying the recording area of the sub-segment is changed before andafter reassignment. At this point, the reassign processing is clearlydifferent from that in the ninth embodiment. Therefore, in the LBAspecified by the I/O request SR, the recording area of the sub-segmentmay not be correctly specified. Through processing by the addressconversion part 1107 (will be described later), however, the read/writecontroller 1102 can obtain the recording area of the sub-segmentcorrectly without any problems.

[0771] When receiving an I/O request SR through the host interface 72,the read/write controller 73 notifies the address conversion part 1107of the LBA specified by the I/O request SR (step S281 of FIG. 64). Theaddress conversion part 1107 converts the notified LBA and block lengthof the I/O request SR into the LBA of the sub-segment according to theRAID architecture. The address conversion part 1107 determines whetheran alternate area has been assigned to the LBA of the sub-segment byaccessing to the address information 11110 managed by the second storagepart 1106 (step S282). If an alternate area has been assigned, theaddress conversion part 1107 fetches the LBA of the alternate area fromthe address information 1108 to notify the read/write controllerthereof. If an alternate area has not been assigned, the addressconversion part 1107 notifies the read/write controller 1102 of theconverted LBA as it is (step S283). As shown in FIG. 65, the addressinformation 11110 is constructed in list form. In that list, the LBAspecifying the recording area in which the sub-segment is currentlyrecorded (shown as current LBA in FIG. 65) is registered for each LBAspecifying the original recording area of the sub-segment (shown asoriginal LBA in FIG. 65). The address conversion part 1107 can correctlyrecognize the LBA specifying the recording area of the sub-segmentrequested by the I/O request SR by referring to the address information11110, notifying the read/write controller 1102 thereof.

[0772] The read/write controller 1102 generates an I/O request SSR in aunit of sub-segment using the sub-segment notified from the addressconversion part 1107 (step S284). This I/O request SSR includes the LBAspecifying the recording area of the sub-segment. The relation between asegment and a sub-segment has been described in the ninth embodiment,and therefore its description is omitted herein. Further, as describedin the ninth embodiment, when accessing to the recording area of thesub-segment, the disk drive 1002 can successively input/output thesub-segment. The read/write controller 1102 transmits the generated I/Orequest SSR to the disk drive 102 through the disk interface 74 (stepS285).

[0773] The reassignment part 1103 executes the flow chart shown in FIG.66, providing timing for executing reassignment (steps S271 to S279).Since the processing of steps S271 to S279 is the same as that of stepsS251 to S259, their description is omitted herein. Although thereassignment part 1103 also executes the processing shown in the flowcharts of FIGS. 57 to 59, illustration is herein omitted for the purposeof simplification of description. When the count value N≧the limit valueN_(L) is satisfied, the reassignment part 1103 assumes that therecording area of the sub-segment is defective, accessing to thealternate area information 1109 stored in the first storage part 1104(refer to FIG. 63) to select the alternate area for the defective areafrom among the available alternate areas (step S2710). The alternatearea is equal to the defective area, that is, the sub-segment, in size,as described above.

[0774] The reassignment part 1103 notifies the address conversion part1107 of the LBA of the defective area (the LBA specified by the I/Orequest) and the LBA of the selected alternate area (step S2711). Theaddress conversion part 1107 executes calculation according to the RAIDarchitecture, drawing the LBA specifying the original recording area ofthe sub-segment (original LBA) and the LBA specifying the currentrecording area (alternate area) thereof (current LBA). The addressconversion part 1107 accesses to the second storage part 1106 toregister the drawn original LBA and current LBA in the addressinformation 11110 (refer to FIG. 65) (step S2712). With the addressinformation 11110 being updated, the read/write controller 1102 uses thecurrent LBA when another I/O request for the sub-segment subjected toreassignment this time is generated next.

[0775] Further, the reassignment part 1103 updates the alternate areainformation 1109 stored in the first storage part 1104 so as not toselect again the alternate area selected in step S2710, terminating theuse of the selected alternate area for each disk drive 1002 (stepS2713). The processing after the step S2713 is shown in the flow chartof FIG. 67 (refer to B in FIG. 66). The count part 11 includes, as shownin FIG. 68, counters for counting the used amount (or the remainingamount) of the alternate areas at present. The reassignment part 1103increments the value of the counter for the present disk drive subjectedto reassign processing by “1” (step S2714 of FIG. 67).

[0776] As described above, reassign processing is also executed in thepresent embodiment, and an alternate area is assigned to a defectivearea. When the I/O request SSR requests write operation, the read/writecontroller 1102 instructs the disk drive 1002 subjected to reassignprocessing to write the sub-segment. When the I/O request SSR requestsread operation, the read/write controller 1102 recovers the unreadsub-segment, transmitting the same to the host device 81 and instructingthe disk drive 1002 subjected to reassign processing to write therecovered sub-segment. Thus, as in the ninth embodiment, the datarecorded in the disk drives 1002 can maintain consistency before andafter reassignment.

[0777] Further, when the alternate area information 1109 and the addressinformation 11110 are updated in the above described manner, the diskcontroller 1101 stores the updated information in the system areasreserved in each disk drive 1002 and 1003.

[0778] Whenever processing in steps S271 to S2714 is executed on thesame disk drive 1002, the alternate areas in that disk drive 1002 becomeshort. In such disk drive 1002, the alternate areas are eventually allconsumed, and therefore are unsuitable for the area for recording data.Thus, in step S2715 that follows step S2714, the reassignment part 1103checks whether the counter value N_(v) counting the used amount of therecording areas in the disk drive 1002 reaches a predetermined limitamount V_(L) or not to determine whether the disk drive 1002 is suitablefor recording data or not. As described above, the counter value N_(v)of each counter indicates the used amount (or the remaining amount) ofthe alternate areas reserved for each disk drive 1002. That is, in stepS2715, when the counter value N_(v) reaches the limit amount V_(L), thereassignment part 1103 assumes that the disk drive 1002 is unsuitablefor recording data because of a shortage of the alternate areas. Thelimit amount V_(L) is appropriately selected in consideration of thesize of the alternate areas previously reserved in each disk drive 1002.

[0779] In step S2715, when determining that the disk drive 1002 isunsuitable for recording data, the reassignment part 1103 ceases to usethe disk drive 1002 for data recording, and determines to use the sparedisk drive 1003. In response to this determination, The disk controller1101 controls the disk group 1001 to copy the data (sub-segment, parity,data recorded in the system area) recorded in the disk drive 1002 to thespare disk drive 1003 (step S2716). After this copy control ends, thedisk controller 1101 updates the address information 11110 to provideconsistency in the original LBA and the current LBA. Thus, even ifreceiving the I/O request SR specifying the original LBA from the hostdevice 81, the read/write controller 1102 can fetch the current LBA ofthe sub-segment from the address conversion part 1107. In other words,the disk controller 1101 can correctly recognize the spare disk drive1003 as the disk drive for recording data. Therefore, the host device 81is not required to recognize the replacement of the disk drive 1002 withthe spare disk drive 1003 in the disk group 1001.

[0780] When determining in step S2715 that the disk drive 1002 issuitable for recording data, the reassignment part 1103 returns to stepS271 (refer to C) to use the disk drive 1002 for recording data.

[0781] As described above, according to the present embodiment, thereassignment part 1103 selects the alternate area referring to thealternate area information 1109 of the disk drive 1002 subjected toreassignment. All of the alternate areas registered in the alternatearea information 1109 have been determined to be suitable for successivedata transmission (not requiring unnecessary seek time or rotationallatency) through the check on initial activation of the present diskarray device 91. Thus, the present disk array device 91 can suppressadditional occurrence of a delay in response, allowing input/output ofsub-segment in real time after reassignment.

[0782] On initial activation and regularly during operation, therecording areas of the sub-segments and parity in each disk drive 1002and 1003 are checked whether to be suitable for successive datatransmission. An alternate area is assigned to the recording area whichhas been determined to be unsuitable through this check. Thus, in thedisk array device 91, the recording areas of the sub-segments and parityare always kept suitable for successive data transmission, andunnecessary occurrence of a delay in response can be prevented.

[0783] Furthermore, in the present disk array device, when the alternateareas of the data disk drive 1002 become short, the spare disk drive1003 is used as that disk drive 1002. The sub-segment or parity recordedin the disk drive 1002 with a shortage of the alternate areas is copiedto the spare disk drive 1003. When the disk drive 1002 with a shortageof the alternate areas is continuously used for a long time, unnecessarydelays in response tend to occur. In the present disk array device 91,however, use of the spare disk drive 1003 prevents the capability frombeing impaired due to such delay in response.

[0784] The first storage part 1104 and the second storage part 1106 areoften constructed by a volatile storage device. Therefore, when the diskarray device 91 is powered off, the alternate area information 1109 andthe address information 11110 are deleted. In the system areas reservedin each disk drive 1102, however, the alternate area information 1109and the address information 11110 can be recorded. In the presentembodiment, the alternate area information 1109 and address information11110, both of which are updated whenever reassignment is executed, arerecorded in the system areas when the present disk array device 91 ispowered off, and therefore it is not required for the disk controller1101 to additionally include an expensive non-volatile storage devicefor storing the alternate area information 1109 and the addressinformation 11110.

[0785] Described next is a non-volatile storage device 1108 shown inFIG. 62. In the disk array device 91, the system area is reserved ineach disk drive 1002 and 1003. In the system area, information similarto the address information 11110 is recorded, as described above. Insome cases, however, the disk drive 1002 or 1003 may be removed from thedisk array device 91 while the disk array device 91 is powered off. Ifpowered on without either the disk drive 1002 or 1003, the disk arraydevice 91 is possibly not activated normally. Therefore, thenon-volatile storage device 1108 is provided in the disk controller1101, storing the address information 11110. When the disk array device91 is powered on, the address information 11110 is read from thenon-volatile storage device 1108 into the second storage part 1106. Thepresent disk array device thus can be activated normally. Furthermore,in the disk array device 91, an alternate area may be assigned to thesystem area in each disk drive 1002 or 1003. In this case, the storagedevice 1108 stores the original LBA and the current LBA of the systemarea. The disk controller 1101 reads the current LBA of the system areafrom the storage device 1108, and then accesses to the read current LBAin the disk drive 1002 or 1003, thereby correctly accessing to thesystem area.

[0786] In the ninth and tenth embodiments, the alternate area is thearea in which the overhead at the time of read or write operation of thedisk drive 62 and 1002 is within a predetermined range. The alternatearea may be, however, the area in which the time required for read andwrite operation is within a predetermined range in consideration ofinput/output in real time. Furthermore, in the ninth and tenthembodiments, the reassign timing determined by the reassignment part 75and 1103 is when the delay time T_(D)>the limit time T_(L) is satisfiedsuccessively a predetermined number of times for the same recording areain the same disk drive 62 and 1002. However, the reassign timing may bewhen the delay time T_(D)>the limit time T_(L) is satisfied M times (Mis a natural number of 1 or more and M<N) in recent N read or writeoperations (N is a natural number of 2 or more) for the same recordingarea in the same disk drive 62 and 1002. Further, the reassign timingmay be when the average value of the delay time required in recent Nread or write operations (N is a natural number of 2 or more) exceeds apredetermined threshold. In other words, the reassign timing may takeany value as long as it is determined based on the delay time T_(D)measured from the process start time of I/O request SSR.

[0787] In the tenth embodiment, the alternate area is equal to thesub-segment in size, that is, of a fixed length. However, the firststorage part 1104 may manage the recording area allowing successive datatransmission as the recording area of a variable length, and thereassignment part 1103 may select the alternate area of required sizefrom the alternate area information 1109 when executing reassignment.

[0788] While the invention has been described in detail, the foregoingdescription is in all aspects illustrative and not restrictive. It isunderstood that numerous other modifications and variations can bedevised without departing from the scope of the invention.

What is claimed is:
 1. A disk array device for executing a readoperation for reading data recorded therein in response to a first readrequest transmitted thereto, said disk array device having recordedtherein data blocks generated by dividing the data and redundant datagenerated from the data blocks, said disk array device comprising: mdisk drives across which the data blocks and the redundant data aredistributed; and a control part operable to control the read operation,wherein said control part is operable to: issue second read requests toread the data blocks and the redundant data from said m disk drives inresponse to the first read request sent thereto; detect a disk drive,from among said m disk drives, from which reading of either one of thedata blocks or the redundant data is no longer necessary; and issue aread termination command to terminate reading of the one of the datablocks or the redundant data by said detected disk drive, wherein saiddetected disk drive is enabled to commence reading of any subsequentdata block or redundant data without being disconnected from said diskarray device.
 2. The disk array device according to claim 1, whereinwhen (m-1) of said m disk drives complete reading, said control part isoperable to: determine that reading being executed in one remaining diskdrive, as said detected disk drive, is no longer necessary; and issue aread termination command to said one remaining disk drive.
 3. The diskarray device according to claim 1, wherein when detecting that two ormore of said m disk drives cannot complete reading, said control part isoperable to: determine that reading being executed in other disk drivesis no longer necessary; and issue a read termination command to thedetermined other disk drives.
 4. The disk array device according toclaim 1, wherein when (m-1) of said m disk drives complete reading, saidcontrol part is operable to: determine that reading not yet beingexecuted in one remaining disk drive of said m disk drives is no longernecessary; and issue a read termination command to said one remainingdisk drive.
 5. A disk array device for executing a read operation forreading data recorded therein in response to a first read request from ahost device, said disk array device with data blocks generated bydividing the data and redundant data generated from the data blocksrecorded therein, said disk array device comprising: m disk drivesacross which the data blocks and the redundant data are distributed,wherein m≧2; a parity calculation part operable to calculate parity from(m-2) of the data blocks and the redundant data to recover one remainingdata block; and a control part operable to control the read operation;wherein said control part is operable to: in response to the first readrequest sent thereto, refer to a faulty block table and determinewhether or not (m-1) of said m disk drives have previously failed toread each of the data blocks; when determining that said (m-1) diskdrives have not previously failed to read each of the data blocks, issuesecond read requests to said (m-1) disk drives to read only each of thedata blocks; when the data blocks are read from said (m-1) disk drives,execute an operation for transmitting the data to the host device; andwhen determining that said (m-1) disk drives have previously failed toread each of the data blocks, issue second read requests to said m diskdrives to read (m-1) of the data blocks and the redundant data.
 6. Thedisk array device according to claim 5, wherein said control part isoperable to: when said (m-1) disk drives complete reading, detectwhether or not a set of the data blocks and the redundant data has beenread from said (m-1) disk drives; when detecting that the set of thedata blocks and the redundant data has been read, issue a recoveryinstruction to said parity calculation part to recover the one remainingdata block not read from one remaining disk drive of said m disk drives;and when the one remaining data block is recovered by the calculation ofparity in said parity calculation part, execute an operation fortransmitting the data to the host device.
 7. The disk array deviceaccording to claim 6, further comprising: a table for registeringtherein a recording area of a data block which has previously failed tobe read by said (m-1) disk drives, wherein said control part is operableto determine whether to issue the second read requests to said (m-1)disk drives or to said m disk drives.
 8. The disk array device accordingto claim 7, further comprising: a reassignment part operable to, when adefect occurs in a recording area of one of the data blocks or theredundant data in said m disk drives, execute reassign processing forassigning an alternate recording area to the defective recording area,wherein when said reassignment part assigns the alternate recording areato the defective recording area of the data block registered in saidtable by said reassignment part, said control part is operable to deletethe defective recording area of the data block from said table.
 9. Thedisk array device according to claim 8, wherein each of said m diskdrives has an alternate recording area previously reserved therein, andsaid disk array device further comprises: a first table storage partoperable to store a first table for registering an address of thealternate recording area reserved in each of said m disk drives asalternate recording area information; and a second table storage partoperable to store a second table for registering address information ofthe alternate recording area assigned to the defective recording area,wherein said reassignment part is operable to: when the second readrequests are transmitted from said control part to said m disk drives,measure a delay time in each of said m disk drives; determine whether ornot each of the recording areas of the data blocks and the redundantdata to be read by each of the second read requests is defective basedon the measured delay time; when determined that the recording area isdefective, assign the alternate recording area to the defectiverecording area based on the alternate recording area informationregistered in the first table of said first table storage part; andregister the address information of the assigned alternate recordingarea in the second table of said second table storage part, wherein saidcontrol part is operable to issue the second read requests based on theaddress information registered in the second table of said second tablestorage part, and wherein the delay time is a time period calculatedfrom a predetermined process start time.
 10. The disk array deviceaccording to claim 1, further comprising: a reassignment part operableto, when a defect occurs in a recording area of one of the data blocksor the redundant data in said m disk drives, execute reassign processingfor assigning an alternate recording area to the defective recordingarea.
 11. The disk array device according to claim 10, wherein each ofsaid m disk drives has an alternate recording area previously reservedtherein, and said disk array device further comprises: a first tablestorage part operable to store a first table for registering an addressof the alternate recording area reserved in each of said m disk drivesas alternate recording area information; and a second table storage partoperable to store a second table for registering address information ofthe alternate recording area assigned to the defective recording area,wherein said reassignment part is operable to: when the second readrequests are transmitted from said control part to said m disk drives,measure a delay time in each of said m disk drives; determine whether ornot each of recording areas of the data blocks and the redundant data tobe read by each of the second read requests is defective based on themeasured delay time; when determined that the recording area isdefective, assign the alternate recording area to the defectiverecording area based on the alternate recording area informationregistered in the first table of said first table storage part; andregister the address information of the assigned alternate recordingarea in the second table of said second table storage part, wherein saidcontrol part is operable to issue the second read requests based on theaddress information registered in the second table of said second tablestorage part, and wherein the delay time is a time period calculatedfrom a predetermined process start time.
 12. The disk array deviceaccording to claim 11, wherein said reassignment part is operable toassign the alternate recording area to the defective recording area onlywhen determining successively a predetermined number of times that therecording area is defective.
 13. The disk array device according toclaim 11, wherein the predetermined process start time is a time wheneach of the second read requests is transmitted to each of said m diskdrives.
 14. The disk array device according to claim 11, wherein thepredetermined process start time is a time when said m disk drives startreading based on the second read requests.
 15. The disk array deviceaccording to claim 1, wherein said disk array device further comprises mSCSI interfaces corresponding to said m disk drives, and wherein saidcontrol part is operable to notify each of said m SCSI interfaces of astorage location selected from a storage area in each of said m diskdrives, respectively.
 16. The disk array device according to claim 5,wherein said disk array device further comprises m SCSI interfacescorresponding to said m disk drives, and wherein said control part isoperable to notify each of said m SCSI interfaces of a storage locationselected from a storage area in each of said m disk drives,respectively.